Git Rev News Edition 89 (July 31st, 2022)

Git Rev News: Edition 89 (July 31st, 2022)

Welcome to the 89th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of July 2022.

Discussions

Support

[BUG?] Major performance issue with some commands on our repo’s master branch

Tassilo Horn sent a long email to the mailing list saying that while writing the email he found a solution to his problem, and that he tells about that solution at the end of the email.

He then explained that he works on a quite big repo where usual commands were all quick, but some plumbing commands invoked by Magit were “extremely slow for no obvious reasons”.

He found that:

git show --no-patch --format="%h %s" "master^{commit}" --

took around 13 seconds, while:

git show --no-patch --format="%h %s" "SHD_ECORO_3_9_7^{commit}" --

took around 23 milliseconds.

So there was almost a factor of 1000 difference when the same command was launched on the master branch or on the last release branch.

He suspected that the difference could have something to do with the fact that there had been a lot of merges recently in master.

The solution he found while writing the email was to comment out the diff.renameLimit = 10000 he had in his ~/.gitconfig file. This reduced the time for the first command above from 13 seconds to 150 milliseconds.

Tassilo wondered though why diff.renameLimit had such an influence, as, when --no-patch is provided, no diff should be generated.

Tao Klerks replied to Tassilo that he used to use git show to get metadata from a big repo as the command allows specifying an arbitrary list of commits in one call which can save process overhead on Windows. He stopped using git show though after it had been reported to be slow. Instead he switched to git log and accepted some process overhead, as it solved his problem.

He found that large commits and merge commits with a number of changes on any side were slow despite using --no-patch. And Tassilo’s email made him suspect that it could be linked to rename detection.

Tassilo replied to Tao that using git log also solved his problem but that someone might want to do something to fix git show when no diff was generated.

Tao replied that he found that adding --diff-merges=off to the git show --no-patch ... command made it perform as fast as git log. So he could now combine the performance of git log with passing an arbitrary list of commits that git show allowed.

Peff, alias Jeff King, also replied to Tassilo that git show might in some cases need underlying diffs even if --no-patch is requested, while git log might need them too, but not when looking at merges. Peff wondered if the code could be changed to detect when underlying diffs aren’t needed, but thought that it could make the code “a bit brittle”.

Kyle Meyer then asked Peff if making --no-patch imply --diff-merges=off was safe, as Tao suggested, and showed benchmarks on the git.git repo where it was nearly 15 times faster with --diff-merges=off.

Peff replied that he wasn’t sure it would be safe and gave an example of a git show command using --diff-filter=D where the output changes when diff merges are suppressed as the commit isn’t showed then. He suggested a mode skipping the diff when it’s not shown but always showing the commit. He said that it would require someone verifying it does the right thing in all cases though.

Junio Hamano, the Git maintainer, chimed in to agree with Peff that making --no-patch imply --diff-merges=off would cause a regression, and mentioned other git show options where the output could be affected.

Junio also elaborated on Peff’s idea of a mode skipping the diff when it’s not shown but always showing the commit. He suggested an explicit git show --log-only option and mentioned a few ways to make it work regarding other options.

Peff and Junio discussed the idea a bit more, but it doesn’t look like something will be implemented soon.

Tassilo and Peff also discussed the diff.renameLimit option a bit. It looks like Tassilo would like to set it to a high value only in some contexts, though the current config might not be sufficient to express that.

Developer Spotlight: Junio C Hamano

Who are you and what do you do?

I am an open source toolsmith, who works at Open Source Program Office (OSPO) at Google. I work as the maintainer of Git.
How has your journey with Git as its maintainer been so far?

I have worked with many contributors since 2005, and seeing so many of them grow into excellent developers as they work on this project was a joy.
How does your work as the maintainer of the Git project look like?

Unlike earlier days, I no longer have to be in the driver’s seat to design or implement a big feature, and luckily there are multiple groups of people who do excellent job dreaming up new ways to use Git and make it a reality. My best days start by getting greeted by a surprisingly good proposal of a new feature or optimization of an existing feature, and the entire day is spent on reviewing them. It happens less often these days, but they still do occasionally.

On my normal days, I scan the mailing list for patches and discussions. My goal is to at least open every one of incoming messages, and read carefully at least half of the patches I pick up to queue on the ‘seen’ branch, which means that I may be queuing the other half without carefully reading, trusting the reviews already done by other members of the community.

I aim to finish picking up new topics, replacing existing topics, and generally interacting with the mailing list, by around 2pm. Then I rebuild the ‘seen’ topic, rewrite the latest draft of “What’s cooking” report (which is the guide to choose which topics will go to the ‘next’ branch), and push out the first integration result of the day by 4pm. After that, I merge the topics that have cooked long enough in ‘next’ to ‘master’, and the topics that have been adequately reviewed to ‘next’. The ‘seen’ and ‘next’ branches are rebuilt, the “What’s cooking” report is rewritten, and the second integration result of the day gets pushed out, before I call it a day.
If there’s anything you would like to say to your past self, what would that be?

Study math and algorithms harder, perhaps. I know them well enough to get around, but it is primarily because there are so many other community members who are good at them that I can rely on, so I do not have to get involved in details that are too deep for me.
What would you name your most important contribution to Git?

There are too many to list in the design and implementation of the internals, but in the end, I think what mattered to the project most was that I was consistently there, available to help guide the project.

While it may not be a particularly “important contribution”, my most favorite creation is git rerere. It was fun to design, work on, and (most importantly) name it.
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?

The interoperability between SHA-1 and SHA-256 repositories first comes to mind. The ingredients are almost all there, so are rough designs.
If you could remove something from Git without worrying about backwards compatibility, what would it be?

I generally think that the hacks to lie to the object layer, like “grafts” and “replace” mechanisms, and “alternate” object store, were not great ideas. There aren’t all that many that I would remove, but there are too many that I wish we could redo entirely without backward compatibility issues (e.g. path ordering in the index).
What is your favorite Git-related tool/library, outside of Git itself?

My recent favorites are Eric’s “public-inbox”, which powers the https://lore.kernel.org/git mailing list archive, and Konstantin’s “b4” that helps to interact with the lore archive better.
What is your advice for people who want to start Git development? Where and how should they start?

That depends on why they want to do so. Do they not like the way the tool currently works and they want to change it? Are they happy with Git but they feel they do not know enough of it and are curious about how things work? For the former folks, the usual “scratch your itch” advice would work well to a certain extent. For the latter folks, their “itch” can be to try to figure out how one of their favorite subcommand works, and then perhaps clean up or optimize the code paths they studied. Following what our test scripts are doing to your favorite feature may be a great way to learn them, too.

Either way, while identifying and scratching their “itch”, I’d recommend they lurk on the mailing list for at least a few weeks, and starting early is good. Learn by reading others’ patches being discussed, the kind of things the reviewers are interested in, and how the development process works. See who is active and how the discussion goes. These social conventions, how our developers work with others, are just as important as, if not more than, technical details. The “MyFirstContribution” document may also be a good place to start.
How does your mailing list workflow look like?

I try not to be the first to comment on any new topic (unless the topic gains no attention from anybody, yet it looks worth commenting on, which sometimes happens). This is because other members of the community may offer interesting viewpoints that I would not hear (and think of myself) if I spoke first.

My ideal is that a topic gets discussed without me for a few rounds and by the time I feel like taking a look myself, it is already quite ready, thanks to the help from competent reviewers (which happens quite often). Then I can pick it up and queue almost directly to ‘next’ (in practice, I’d first merge it to ‘seen’ to ensure that there is no funny interactions with existing topics in the first integration cycle of the day that happens before 2pm, and then merge it down to ‘next’ in the second integration cycle of the day, or perhaps the day after).

As I cannot keep track of all the things said in the discussion on all topics, I make heavy use of the lore archive and use the draft of “What’s cooking” report to take notes on each topic’s current status. So, the workflow is
1. to notice a new topic, whose merit is either obvious to see or others reviews helped to highlight, and queue it to ‘seen’, while sending comments to them,
2. record the topic in the “What’s cooking” draft,
3. to observe the discussion on the topic, perhaps taking a reroll and replacing the copy I have,
4. send out an updated “What’s cooking”, possibly the topic marked for promotion,
5. go back to (3)
until each topic graduates to ‘master’.
If there’s one tip you would like to share with other Git developers, what would it be?

It is very easy to be too strongly married to your initial solution and become blind to the merits of other approaches suggested in the discussion by others, or accept a possible reframing of the original problem to solve a wider problem.

In the very early days of Git, before Linus passed the project’s maintainership to me, I also had a “competitive” manner in a bad way. There were plenty of problems to be solved, and it felt as if people competed to be the first to offer a solution to each of them. When I had an idea to solve a problem that others were also interested in, sometimes it felt like I had to “beat” them, which meant that I had to send out patches even before they are well reasoned and well explained enough. Luckily, being maintainer means I no longer have to compete with others. Instead I can set the pace of the project. After having lived in such a “competitive” way for a few months and saw its downsides, I learned to give contributors time to think things through and wait for counter-proposals.

So, one “tip” is to take things slowly. Be ready to step back, take a deep breath and take time to rethink the problem, together with those in the discussion. You’d end up working well with the others that way.

Releases

Git 2.37.1 and others
Git for Windows 2.37.1(1)
libgit2 1.5.0, 1.4.4
GitHub Enterprise 3.6.0, 3.5.3, 3.4.6, 3.3.11, 3.2.16, 3.5.2, 3.4.5, 3.3.10, 3.2.15
GitLab 15.2.1, 15.1.4, and 15.0.5 15.2, 15.1.3, 15.1.2, 15.1.1
GitHub Desktop 3.0.5, 3.0.4
Sourcetree 4.1.9

Other News

Various

Contributor’s Summit Registration details have been posted by Taylor Blau. As usual the Summit will happen the day before the main conference day of the Git Merge and very close to it.
GitLive 15.0: Offline merge conflict detection across all branches for any Git repository and more by Agnieszka Stec.
Iterative.AI, authors of the DVC Data Version Control tool (first mentioned in Git Rev News Edition #42), bring Git-backed Machine Learning Model Registry to Iterative Studio.
About GIT Internals: Aman asked how to better understand GIT software internals. Junio (maintainer) suggested starting at the Initial revision of “git”, the information manager from hell (e83c5163, v0.99~954). “With only 1244 lines spread across 11 files, it is a short-read that is completable in a single sitting for those who are reasonably fluent in C. It does not have any frills, but the basic data structures to express the important concepts are already there.”
Matheus Tavares successfully defended his Master’s dissertation about parallelizing Git checkout. Some of his work on this was done during his 2019 Google Summer of Code.
Whatever happened to SHA-256 support in Git?: an article by Jonathan Corbet on LWN.net.
Give Up GitHub: The Time Has Come! by Denver Gingerich and Bradley M. Kuhn of Software Freedom Conservancy, (see also Open source body quits GitHub, urges you to do the same in The Register) calls for FOSS project to migrate away from GitHub.

Light reading

Introduction to Git Ops: Some useful background to the Git - DevOps approach in this sponsored article.
Git workflows: Best practices for GitOps deployments, describing differences between how you manage your code in Git and how you manage your GitOps configuration in Git. By Christian Hernandez, GitOps Advocate, on Red Hat Developer blog.
git rebase --fork-point considered harmful (by me): The reflog lookup heuristics aren’t what you thought, are they? A UX report by Josh Bleecher Snyder.
Git Delete Branch How-To, for Both Local and Remote with pictures, also includes deleting branches on GitHub.
Git - Subtree: A short overview of the common replacement for Git submodule.
Managing Git projects with submodules and subtrees (2020): More choices. This was previously mentioned in Git Rev News Edition #63.
‘Turn off merge fast-forward by default’: An alternative viewpoint (from Git for Windows #3858) by Maciej Radzikowski on the Better Dev blog.
Getting Started with Git Hooks and Husky by Bruno Brito on Tower’s blog.
“Codeberg” series by Flavio Poletti (@polettix), starting with Codeberg and currently ending with Codeberg Pages - Custom domains, describes the Codeberg hosting service, which is powered by Gitea software (a fork of Gogs, which is written in Go).
Write Better Commits, Build Better Projects: Learn strategies to improve and use commits to streamline your development process. By Victoria Dye on GitHub Engineering Blog.
Write Git Commit Messages That Your Colleagues Will Love by Simon Egersand on DEV.to.
Why I love Tig for visualizing my Git workflows by Sumantro Mukherjee (Correspondent, Red Hat) on OpenSource.com.

Easy watching

Git Internals - The BLOB: ‘A shot of code’ looks at the internals of the .git folder to see exactly what goes on under the hood.
Getting Comfortable with GIT, looking to get a deeper understanding of Git, and hopefully feeling a lot more comfortable when performing some of the more scary Git operations… says ‘A shot of code’.
It’s Impossible to Know If You’re a Good Programmer: The impostor syndrome and irrelevant code challenges.

Git tools and sites

Git Signing: All the details for signing commits and tags (for non-repudiation) in GitHub using GPG, Vault, Yubikey, Keybase!
Git-Story: Animate the story of your Git project, by creating video animations (.mp4) of your commit history directly from your Git repo. Note: arrows point in the direction of increasing time, not from commit to its parents.
- See also How to Animate Your Git Commit History with git-story by Jacob Stopak on freeCodeCamp.
GitHub Actions Kotlin DSL (docs) allows authoring GitHub Actions workflows for GitOps in Kotlin.
- See also GitHub Actions DSL: a New Hope in YAML Programming Wasteland by Jean-Michel Fayard on DEV.to.
jc (JSON Convert) is a CLI tool and Python library that converts the output of popular command-line tools and file-types to JSON, YAML, etc… including git log. This allows automated processing with tools such as jq, and simplifying automation scripts.
GitJournal is a mobile first Markdown notes app integrated with Git, for Android and iOS, that can work with any Git hosting provider (via SSH).
data-diff is an open-source command-line tool and Python library by Datafold to efficiently diff rows across two different databases.
- See also references to various blog posts by Tim Sehn on DoltHub Blog about version control, Git, and databases, which appeared in Git Rev News Edition #82.
GitHacker is a multithreaded tool to detect whether a site has the .git/ folder leakage vulnerability, and if so restores the entire Git repository, including data from stash.
- See also Please remove that .git folder from directory browsing on the web server to avoid information leaks. Don’t deploy the .git/ folder or, at least, forbid access. By Julien Maury (jmau111) on DEV.to and on his own blog.
Build your own X is a compilation of well-written, step-by-step guides for re-creating our favorite technologies from scratch, including Build your own Git.

What I cannot create, I do not understand — Richard Feynman.

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Junio Hamano, Philip Oakley and Bruno Brito.