Welcome to the 89th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the month of July 2022.
[BUG?] Major performance issue with some commands on our repo’s master branch
Tassilo Horn sent a long email to the mailing list saying that while writing the email he found a solution to his problem, and that he tells about that solution at the end of the email.
He then explained that he works on a quite big repo where usual commands were all quick, but some plumbing commands invoked by Magit were “extremely slow for no obvious reasons”.
He found that:
git show --no-patch --format="%h %s" "master^{commit}" --
took around 13 seconds, while:
git show --no-patch --format="%h %s" "SHD_ECORO_3_9_7^{commit}" --
took around 23 milliseconds.
So there was almost a factor of 1000 difference when the same command was launched on the master branch or on the last release branch.
He suspected that the difference could have something to do with the fact that there had been a lot of merges recently in master.
The solution he found while writing the email was to comment out the
diff.renameLimit = 10000
he had in his ~/.gitconfig
file. This
reduced the time for the first command above from 13 seconds to 150
milliseconds.
Tassilo wondered though why diff.renameLimit
had such an
influence, as, when --no-patch
is provided, no diff should be
generated.
Tao Klerks replied to Tassilo that he used to use git show
to get
metadata from a big repo as the command allows specifying an
arbitrary list of commits in one call which can save process
overhead on Windows. He stopped using git show
though after it had been
reported to be slow. Instead he switched to git log
and accepted
some process overhead, as it solved his problem.
He found that large commits and merge commits with a
number of changes on any side were slow despite using
--no-patch
. And Tassilo’s email made him suspect that it could be
linked to rename detection.
Tassilo replied to Tao that using git log
also
solved his problem
but that someone might want to do something to fix git show
when
no diff was generated.
Tao replied that he found that adding --diff-merges=off
to the
git show --no-patch ...
command made it perform as fast as
git log
. So he could now combine the performance of git log
with passing an arbitrary list of commits that git show
allowed.
Peff, alias Jeff King, also replied to Tassilo that git show
might
in some cases need underlying diffs even if --no-patch
is
requested, while git log
might need them too, but not when looking
at merges. Peff wondered if the code could be changed to detect when
underlying diffs aren’t needed, but thought that it could make the
code “a bit brittle”.
Kyle Meyer then asked Peff if making --no-patch
imply
--diff-merges=off
was safe, as Tao suggested, and showed benchmarks
on the git.git repo where it was nearly 15 times faster with
--diff-merges=off
.
Peff replied that he wasn’t sure it would be safe and gave an
example of a git show
command using --diff-filter=D
where the output
changes when diff merges are suppressed as the commit isn’t showed
then. He suggested a mode skipping the diff when it’s not shown but
always showing the commit. He said that it would require someone
verifying it does the right thing in all cases though.
Junio Hamano, the Git maintainer, chimed in to agree with Peff that
making --no-patch
imply --diff-merges=off
would cause a
regression, and mentioned other git show
options where the output
could be affected.
Junio also elaborated on Peff’s idea of a mode skipping the diff
when it’s not shown but always showing the commit. He suggested an
explicit git show --log-only
option and mentioned a few ways to
make it work regarding other options.
Peff and Junio discussed the idea a bit more, but it doesn’t look like something will be implemented soon.
Tassilo and Peff also discussed the diff.renameLimit
option a bit.
It looks like Tassilo would like to set it to a high value
only in some contexts, though the current config might not be
sufficient to express that.
Who are you and what do you do?
I am an open source toolsmith, who works at Open Source Program Office (OSPO) at Google. I work as the maintainer of Git.
How has your journey with Git as its maintainer been so far?
I have worked with many contributors since 2005, and seeing so many of them grow into excellent developers as they work on this project was a joy.
How does your work as the maintainer of the Git project look like?
Unlike earlier days, I no longer have to be in the driver’s seat to design or implement a big feature, and luckily there are multiple groups of people who do excellent job dreaming up new ways to use Git and make it a reality. My best days start by getting greeted by a surprisingly good proposal of a new feature or optimization of an existing feature, and the entire day is spent on reviewing them. It happens less often these days, but they still do occasionally.
On my normal days, I scan the mailing list for patches and discussions. My goal is to at least open every one of incoming messages, and read carefully at least half of the patches I pick up to queue on the ‘seen’ branch, which means that I may be queuing the other half without carefully reading, trusting the reviews already done by other members of the community.
I aim to finish picking up new topics, replacing existing topics, and generally interacting with the mailing list, by around 2pm. Then I rebuild the ‘seen’ topic, rewrite the latest draft of “What’s cooking” report (which is the guide to choose which topics will go to the ‘next’ branch), and push out the first integration result of the day by 4pm. After that, I merge the topics that have cooked long enough in ‘next’ to ‘master’, and the topics that have been adequately reviewed to ‘next’. The ‘seen’ and ‘next’ branches are rebuilt, the “What’s cooking” report is rewritten, and the second integration result of the day gets pushed out, before I call it a day.
If there’s anything you would like to say to your past self, what would that be?
Study math and algorithms harder, perhaps. I know them well enough to get around, but it is primarily because there are so many other community members who are good at them that I can rely on, so I do not have to get involved in details that are too deep for me.
What would you name your most important contribution to Git?
There are too many to list in the design and implementation of the internals, but in the end, I think what mattered to the project most was that I was consistently there, available to help guide the project.
While it may not be a particularly “important contribution”, my most favorite
creation is git rerere
. It was fun to design, work on, and (most
importantly) name it.
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?
The interoperability between SHA-1 and SHA-256 repositories first comes to mind. The ingredients are almost all there, so are rough designs.
If you could remove something from Git without worrying about backwards compatibility, what would it be?
I generally think that the hacks to lie to the object layer, like “grafts” and “replace” mechanisms, and “alternate” object store, were not great ideas. There aren’t all that many that I would remove, but there are too many that I wish we could redo entirely without backward compatibility issues (e.g. path ordering in the index).
What is your favorite Git-related tool/library, outside of Git itself?
My recent favorites are Eric’s “public-inbox”, which powers the https://lore.kernel.org/git mailing list archive, and Konstantin’s “b4” that helps to interact with the lore archive better.
What is your advice for people who want to start Git development? Where and how should they start?
That depends on why they want to do so. Do they not like the way the tool currently works and they want to change it? Are they happy with Git but they feel they do not know enough of it and are curious about how things work? For the former folks, the usual “scratch your itch” advice would work well to a certain extent. For the latter folks, their “itch” can be to try to figure out how one of their favorite subcommand works, and then perhaps clean up or optimize the code paths they studied. Following what our test scripts are doing to your favorite feature may be a great way to learn them, too.
Either way, while identifying and scratching their “itch”, I’d recommend they lurk on the mailing list for at least a few weeks, and starting early is good. Learn by reading others’ patches being discussed, the kind of things the reviewers are interested in, and how the development process works. See who is active and how the discussion goes. These social conventions, how our developers work with others, are just as important as, if not more than, technical details. The “MyFirstContribution” document may also be a good place to start.
How does your mailing list workflow look like?
I try not to be the first to comment on any new topic (unless the topic gains no attention from anybody, yet it looks worth commenting on, which sometimes happens). This is because other members of the community may offer interesting viewpoints that I would not hear (and think of myself) if I spoke first.
My ideal is that a topic gets discussed without me for a few rounds and by the time I feel like taking a look myself, it is already quite ready, thanks to the help from competent reviewers (which happens quite often). Then I can pick it up and queue almost directly to ‘next’ (in practice, I’d first merge it to ‘seen’ to ensure that there is no funny interactions with existing topics in the first integration cycle of the day that happens before 2pm, and then merge it down to ‘next’ in the second integration cycle of the day, or perhaps the day after).
As I cannot keep track of all the things said in the discussion on all topics, I make heavy use of the lore archive and use the draft of “What’s cooking” report to take notes on each topic’s current status. So, the workflow is
to notice a new topic, whose merit is either obvious to see or others reviews helped to highlight, and queue it to ‘seen’, while sending comments to them,
record the topic in the “What’s cooking” draft,
to observe the discussion on the topic, perhaps taking a reroll and replacing the copy I have,
send out an updated “What’s cooking”, possibly the topic marked for promotion,
go back to (3)
until each topic graduates to ‘master’.
If there’s one tip you would like to share with other Git developers, what would it be?
It is very easy to be too strongly married to your initial solution and become blind to the merits of other approaches suggested in the discussion by others, or accept a possible reframing of the original problem to solve a wider problem.
In the very early days of Git, before Linus passed the project’s maintainership to me, I also had a “competitive” manner in a bad way. There were plenty of problems to be solved, and it felt as if people competed to be the first to offer a solution to each of them. When I had an idea to solve a problem that others were also interested in, sometimes it felt like I had to “beat” them, which meant that I had to send out patches even before they are well reasoned and well explained enough. Luckily, being maintainer means I no longer have to compete with others. Instead I can set the pace of the project. After having lived in such a “competitive” way for a few months and saw its downsides, I learned to give contributors time to think things through and wait for counter-proposals.
So, one “tip” is to take things slowly. Be ready to step back, take a deep breath and take time to rethink the problem, together with those in the discussion. You’d end up working well with the others that way.
Various
Light reading
git rebase --fork-point
considered harmful (by me): The reflog lookup heuristics aren’t what you thought, are they? A UX report by Josh Bleecher Snyder.Easy watching
Git tools and sites
git log
. This allows automated processing with tools
such as jq
, and simplifying automation scripts..git/
folder leakage vulnerability, and if so
restores the entire Git repository, including data from stash.
.git
folder
from directory browsing on the web server to avoid information leaks.
Don’t deploy the .git/
folder or, at least, forbid access.
By Julien Maury (jmau111) on DEV.to and on his
own blog.Build your own X is a compilation of well-written, step-by-step guides for re-creating our favorite technologies from scratch, including Build your own Git.
What I cannot create, I do not understand — Richard Feynman.
This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Junio Hamano, Philip Oakley and Bruno Brito.