Welcome to the 20th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the month of September 2016.
Linus Torvalds asked for increasing the default value for number of
characters in SHA-1 abbreviations. The default_abbrev = 7
was
reasonable in the early days of Git, but a project of the size of the
Linux kernel needs git config --global core.abbrev 12
. While Git
will extend the seven hex digits until the object name is unique,
that only reflects the current situation in the repository. It gets
annoying when a commit message has a short git ID that is no longer
unique a few months later when one needs to go back and try to figure
out what went wrong in that commit.
Jeff King, alias Peff, answered the “it gets annoying” part in the [PATCH 0/10] helping people resolve ambiguous sha1s patch series (merged in 66c22ba6). This patch series taught Git to help in the situation where only ambiguous shortened identifier is available, by listing the SHA-1s of the objects it found, along with a few bits of information that may help the user decide which one they meant.
$ git rev-parse b2e1 error: short SHA1 b2e1 is ambiguous hint: The candidates are: hint: b2e1196 tag v2.8.0-rc1 hint: b2e11d1 tree hint: b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options' hint: b2e1759 blob hint: b2e18954 blob hint: b2e1895c blob
The problem of Git providing SHA-1 abbreviation which would soon be invalid was solved in different way than proposed by Linus. Instead of increasing the default abbrev length for all projects, making abbrevs longer and more unwieldy also for small projects that don’t need it, Peff proposed to make default abbrev length be dynamically based on the number of objects in the repository. Linus sent rough implementation of this idea, which after a few iterations (and cleanups of related code) got merged into ‘next’ as bb188d00f7.
Johannes Schindelin, alias Dscho, is the maintainer of Git for
Windows. He is working for Microsoft and, on top of his maintainer
role, he has been working whenever time allowed since February this
year to speed up interactive rebase (git rebase -i
).
Since when it was created in 2005, the git rebase
command has been
implemented using shell scripts that are calling other git commands.
The interactive rebase that Dscho implemented in 2007 calls different
commands than the regular, non interactive, rebase. The regular rebase
uses git format-patch
to create a patch series from some commits,
and then git am
to apply this patch series on top of a different
commit, while the interactive rebase calls git cherry-pick
repeatedly for the same purpose.
Neither of these approaches has been very efficient though, and the
main reason behind that is that repeatedly calling a git command has a
significant overhead. Even the regular git rebase
would do that as
git am
had been implemented by launching git apply
on each of the
patches.
The overhead is especially big on Windows where creating a new process is quite slow, but even on other OSes it requires setting up everything from scratch, then reading the index from disk, and then, after performing some changes, writing the index back to the disk.
In case of the regular rebase,
a patch series has been merged recently
to the ‘master’ branch that makes git am
call git apply
’s internal
functions without spawning the latter as a separate process. So the
regular rebase will be significantly faster especially on Windows and
for big repositories in the next Git feature release.
Dscho’s work achieves the same kind of results for the interactive rebase. The work, which has been distilled to the mailing list has a series of patch series, greatly improves and then uses a mechanism called the sequencer.
The sequencer had been from it’s beginning in 2008 as a GSoC (Google Summer of Code) project envisioned as a low-level patch-application engine written in C that would “take the ‘todo’ file format used by git-rebase -i and extend it to also support applying patches split out of mbox files”, so that “frontends like git-am, git-rebase, etc. can then setup the ‘todo’ script and pass it to git-sequencer, which does the actual patch application, editing, etc.”
Of course it was a too much ambitious project for a GSoC project, so the work that Stephan Beyer, the GSoC student at that time, did to implement it was not merged. A lot of great related work by Stephan had been merged though, and the sequencer idea as well as Stephan’s code were still considered valuable, so that in 2011 another GSoC project was attempted to further the idea and Stephan’s code. This time the goal was to first use the sequencer to improve cherry-picking, and reverting, many commits, and Ramkumar Ramachandra, alias Ram, succeeded. The sequencer code got merged and it was now possible to “continue”, “abort” or “skip” when cherry-picking or reverting many commits.
Despite this success, Dscho has had to improve a lot of things to make it possible to reuse the sequencer in the interactive rebase. For example he had to create a git-rebase–helper in C that ported a lot of the functionality from the git-rebase–interactive.sh shell script.
As Dscho explains in an answer to a question by Jakub Narębski, who asked about the status of the patch series, 10 of his patch series had already been accepted, 5 were in flight and 1 had not yet been submitted at the beginning of September.
These patch series, will speed up the interactive rebase, but are not enough to fully replace the rebase implementation in shell by one in C. According to Dscho such a result is “far, far, far in the future”:
…my hope is that the rebase–helper work is only an initial step, opening the door for other contributors to tackle independent parts of making git-rebase a builtin
Though the patch series have been reviewed by a large number of experienced Git developers like Junio Hamano, Johannes Sixt, Torsten Bögershausen, Jeff King, Jakub Narębski, Dennis Kaarsemaker, Eric Sunshine, Kevin Daudt and Stefan Beller, they are not fully merged into Git yet. But Dscho already “integrated the whole shebang into Git for Windows 2.10.0 and 2.10.1” that were released recently, and “it has been running without complaints (and some quite positive feedback)”.
About the performance improvements, Dscho wrote:
The end game of this patch series is a git-rebase–helper that makes rebase -i 5x faster on Windows (according to t/perf/p3404). Travis says that even MacOS X and Linux benefit (4x and 3x, respectively).
Such performance improvements as well as the code consolidations around the sequencer are of course very nice. It is interesting and satisfying to see that they are the result of building on top of previous work over the years by GSoC students, mentors and reviewers.
Dscho wrote about making interactive rebase much faster in a recent blog post (linked to in previous Git Rev News), repeating and extending information from his answer mentioned in the above article. Among others, he wrote how can he be sure that the code is ready:
The answer: I verified it. Inspired by GitHub’s blog post on their Scientist library, I taught my personal Git version to cross-validate each and every interactive rebase that I performed since the middle of May. That is, each and every interactive rebase I ran was first performed using the original shell script, then using the
git rebase--helper
, and then the results were confirmed to be identical (modulo time stamps).
And further:
Full disclosure: the cross-validation did find three regressions that were not caught by the regression test suite (which I have subsequently adjusted to test for those issues, of course). So it was worth the effort.
One can find which regressions were there in the followup on git mailing list. It is interesting to find the use of the Scientist library for ensuring the quality of Git code refactoring.
Kyle J. McKay had have been wanting a compact one line output format that
included dates, times and initials, and is compatible with --graph
.
=== 2015-09-17 ===
* ee6ad5f4 12:16 jch (tag: v2.5.3) Git 2.5.3
=== 2015-09-09 ===
* b9d66899 14:22 js am --skip/--abort: merge HEAD/ORIG_HEAD tree into index
| === 2015-09-04 ===
| * 27ea6f85 10:46 jch (tag: v2.5.2) Git 2.5.2
* 74b67638 10:36 jch (tag: v2.4.9) Git 2.4.9
..........
* ecad27cf 10:32 jch (tag: v2.3.9) Git 2.3.9
To have all this, Kyle proposed git-log-times
script for contrib/
.
Jeff King was surprised to see this as a separate script, and proposed a
patch series
adding support for features like --commit-header
option for git log
,
making it possible to come close to what git-log-times
provided.
Junio Hamano reminded
that contrib/
area is not the place for random git-related things.
Unlike the earlier days of Git, if a custom command that uses Git is very useful, it can live its own life and flourish within the much larger Git userbase we have these days.
The proposed script was then therefore published as git-log-compact project.
Rich Felker complained that compiling Git with musl libc
no longer works out of the box (that is, without setting the NO_REGEX
build configuration variable) after commit 2f895225.
The proposed workaround unfortunately didn’t work on Windows, as pointed
out by Jeff King and Johannes Schindelin.
There was a bit of derail about which are main Git platforms, and whether Git code should be able to rely on POSIX features. Jakub Narębski reminded that CodingGuidelines specifically state that:
Most importantly, we never say “It’s in POSIX; we’ll happily ignore your needs should your system not conform to it.” We live in the real world.
However, we often say “Let’s stay away from that construct, it’s not even in POSIX”.
In spite of the above two rules, we sometimes say “Although this is not in POSIX, it (is so convenient | makes the code much more readable | has other good characteristics) and practically all the platforms we care about support it, so let’s use it”.
The commit in question, making Git require to use regexp engine with
REG_STARTEND
support, while providing fallback implementation
(turned on with NO_REGEX
), matches 3rd point in the list above. This
extension to regexec()
, introduced by the NetBSD project, is present
in all major regex implementation… though not in musl.
There was yet another proposed fix for the problem, namely adding
padding so that end of mmap-ed file doesn’t fall on the page boundary,
if regex implementation doesn’t support REG_STARTEND
. One one hand,
the workaround relied on undocumented (but sane) assumptions about
operating system behavior, on the other hand it was faster than the
workaround in original patch, that is copying contents to NUL-terminated
buffer. Nevertheless, any workaround would mean additional code that
needs to be maintained, and it was not accepted.
Also, it turned out that configure
script detects if regex engine
support REG_STARTEND
and sets NO_REGEX
if necessary, it was just
badly described. It was since corrected.
Though Git doesn’t yet set NO_REGEX
automatically based on information
from uname
.
Andrew Johnson asked on the mailing list:
While reading Pro Git 2nd Ed. I came across these three methods:
$ git help <verb>
$ git <verb> –help
$ man git-<verb>I tested all three to confirm they were equivalent.
What was the motivation behind the complication, if any? I presume most developers would not provide multiple commands that do the same thing for absolutely no reason, so I led myself to ask this question.
Fredrik Gustafsson was the first to answer. He first said that the three commands are not actually equivalent on Windows as:
$ man git-<verb>
does not work and
$ git help <verb>
opens a webbrowser instead of a man page.
Philip Oakley then answered that the three different methods were added at different times for different reasons. The man methods was first added because “historically git was a set of shell scripts named git-*, so each stood alone”.
The --help
was the result from “the modern git <cmd>
approach, with
every command normally having -h
and --help
options for short form
usage and long form man pages”. Meanwhile “a git help <cmd>
command
was created” which “allowed selection of display type, so that on
Unix/Linux man was the norm, while an --html
(or --web
) option is
available for those who like the pretty browser view”.
Your own Christian Couder chimed in saying that git help
makes it
possible to teach people one command that will do something sensible
on every system, and that it also “provides more configurability and
more features like its -a and -g options”.
Jakub Narębski added that there are also help pages that are about
“concepts (gitcli, gitrevisions, githooks, gitrepository-layout,
gitglossary), or about files (gitignore, gitattributes, to some extent
githooks)” and they are “only accessible with git help <concept>
or,
on OS with installed ‘man’, also man <gitconcept>
”.
Philip replied to the above saying that “git revisions --help
does
work”, but Junio Hamano clarified things by saying that this was a bug
that had been recently fixed.
It would indeed seem wrong to have git <concept> --help
working, as
concepts are not the same things as commands.
Anyway this shows that it is not so simple to design a good help system, especially one that is both full featured on different platforms and looking simple to users.
I’m Dennis Kaarsemaker, I do scalability and security things for Booking.com, part of which includes hacking on our git infrastructure together with Ævar Arnfjörð Bjarmason. I also maintain perl5.git.perl.org and do a lot of user support.
Spending a lot of time in #git and #github on freenode solving people’s git problems. Occasionally this leads to bug reports or even patches, but mostly I’m trying to create make users understand git and make them smile.
Besides user support, I do read the mailing-list and try to review patches or pick up smaller bugs as time permits. Time however is scarce with a fearless 14 month old girl crawling around the house trying to get into trouble :)
If I had a team of developers, their core focus would be scalability for very big repositories. Things like a protocol that is efficient with hundreds of thousands of refs and can be load-balanced properly, or more efficient storage for refs, external files and other data. Or a peer to peer continuous sync protocol for the object store.
Oh, if only I could remove submodules. They’re almost universally used for the wrong reason, are easy to get confused about and use wrong, and they complicate many parts of git.
Definitely GitHub. I appreciate that Git is made for distributed version control, and regularly use it in that way; but the social benefits of having a single place to discover, maintain and collaborate on projects that GitHub offers really helps in getting the most out of my open source experience. I even made a command line API client for Github, GitLab and BitBucket :)
Events
Various
Light reading
Git tools and sites
This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com> and Thomas Ferris Nicolaisen <tfnico@gmail.com>, with help from Jakub Narębski, Dennis Kaarsemaker, Johannes Schindelin, Lars Schneider and Jeff King.