Welcome to the 25th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the month of February 2017.
On February 23rd it was publicly announced that a collision had been found against SHA-1, the cryptographic hash function that Git uses to identify Git objects (blobs, trees, commits, annotated tags).
Details about the collision, how it was performed, as well as algorithms and code to detect such a collision attack were published simultaneously.
This caused numerous news articles related to Git and SHA-1 in many places, for example LWN.net:
as well as many discussions on the mailing list.
There have also been patch series flowing around. Moreover, plans to move Git away from SHA-1 have been shared and discussed.
Linus Torvalds for example sent a Typesafer git hash patch as a first step on fixing SHA-1 implicit dependencies. This one big patch approach, though, is not consistent with the way Brian Carlson has been working on the same issue for a long time. Junio Hamano has not commented on this patch yet. Hence, for the time being it is not sure at all that this topic will move much faster.
Some work on integrating
the code to detect a collision attack into a new
SHA-1 implementation in Git was started by Jeff King,
adding a USE_SHA1DC
knob to the Makefile,
and
then picked up by Linus.
The original code was written by Marc Stevens, working for
CWI and Dan Shumow,
working for Microsoft. Interestingly, both Marc and Dan chimed into the
discussion. Dan agreed to work on adaptations and performance
improvements for Git, and on upstreaming this work into the original
code base.
Junio participated in the discussions, too, and it looks as if the
resulting patch series could be merged for the next Git release;
currently the ‘jk/sha1dc’ is in the ‘pu’ branch.
One of the plans to move Git away from SHA-1 was contributed by Jonathan Nieder, Stefan Beller, Jonathan Tan and Brandon Williams, who are all working in the same team at Google. The latest version of this plan is available in a Google document where it can be commented on. It has also been discussed in the following threads:
Another plan was posted by Ian Jackson; it also generated some discussion.
It’s interesting to note that Git is not the only version control system to be affected by the issue. Here are a few related posts:
Johan Hovold noticed that git send-email
in Git v2.10.2 does not
accept anymore patches with a commit message that contain lines like:
Cc: <stable@vger.kernel.org> # 4.4
Apparently it parses the above as “stable@vger.kernel.org#4.4” and then aborts.
Researching the problem, Johan found a mailing list thread which resulted in some “fixes” that seem to be the root cause of the problem.
He claimed the format of the line that trigger the problem “has been documented at least since 2009” in the Linux kernel and “has been supported by git since 2012”. It is used to tag commits that should be backported into the “stable” Linux kernel versions.
Johan then asked for a way for Git to revert to the old behavior.
Junio wondered if installing the Mail::Address Perl module
would make git send-email
work by avoiding the
“non-parsing-but-paste-address-looking-things-together code” that Git
uses when Mail::Address is not installed. Johan replied that it
doesn’t work.
Matthieu Moy, who worked on the patch that is responsible for the problem, remarked that “a proper fix is far from obvious”, because we want our own parser to work the same way as Mail::Address does, and we don’t want to regress for people who want to get back two email addresses from lines like:
Cc: <foo@example.com> # , <boz@example.com>
as this has been working since September 2015.
Anyway, in another email Matthieu suggested that we should always use our own parser, as “we now have something essentially as good as Mail::Address”, and changing our parser to discard anything after “>” in the email address. Matthieu’s email also contained a patch implementing the latter.
Johan agreed with Matthieu’s plan, tested the patch and found that it
worked. Unfortunately he found another breakage when the
--suppress-cc=self
option is used if more than one email address in
each line is allowed.
It looked as if the discussion was going to continue for some time, but Linus replied to Matthieu stating that Cc: lines in commit messages are not like Cc: lines in email headers. Consequently, we should not accept more than one email address in them. He concluded as follows:
So this notion that the bottom of the commit message is some email header crap is WRONG.
Stop it. It caused bugs. It’s wrong. Don’t do it.
Finally, after Junio had discussed possible breakages with Matthieu’s patch, Matthieu agreed that it was safer to just revert to not accepting many email addresses in the Cc: lines. Junio then accepted a patch submitted by Johan implementing this proposal. In the meantime, this patch was merged to the “next” branch, so it is very likely to appear in the next Git release. Let’s just hope that no one will complain about it.
Various
This year Git was accepted again as one of organizations in GSoC 2017. Students started to work on microprojects.
CWI Institute Amsterdam and Google presented a practical technique for generating SHA-1 collisions.
Light reading
O’Reilly Radar: How to use pull requests to improve your code reviews by Brent Beer and Peter Bell;
advertising that you can find more in their Introducing GitHub book (also on Safari Books Online)
The Myers diff algorithm: part 1, part 2, part 3 by James Coglan;
this was part of his ongoing work on a book explaining the internals of Git through implementation: Building Git
Learn Version Control with Git: A step-by-step course for the complete beginner ebook by Tobias Günther (Git Tower GUI): free online book and video course (partially free)
It appears that WikiLeaks has captured the super secret Git Tips & Tricks from the CIA!
In today’s world of Github and other competing services, it’s easy to overlook how simple it is to set up (unlimited) private repos on any network connected computer, as explained by Alex Kras
How to have a proper Git client on Android, by Pedro Veloso
Adding a SHA1 collision vulnerability test hoses WebKit’s source repository
Git tools and sites
This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Thomas Ferris Nicolaisen <tfnico@gmail.com>, Jakub Narębski <jnareb@gmail.com> and Markus Jansen <mja@jansen-preisler.de> with help from Lars Schneider, Luca Milanesio and Junio Hamano.