Welcome to the second edition of Git Rev News, a digest of all things Git, written collaboratively on GitHub by volunteers.
Our goal is to aggregate and communicate some of the activities on the Git mailing list in a format that the wider tech community can follow and understand. In addition, we’ll link to some of the interesting Git-related articles, tools and projects we come across.
This special edition covers Git’s 10th year of existence, as well as the Git Merge conference held on April 8th & 9th in Paris, France. Git developers and users alike came together to celebrate the anniversary, and to discuss the current challenges of using and scaling Git.
You can contribute to the upcoming edition by sending pull requests or opening issues.
At the Git Merge 2015, Junio Hamano started off the Contributor Summit with a presentation titled “10 years of fun with Git”, saying that he wanted to take the opportunity of the 10th anniversary to thank the contributors.
He showed how the first initial revision of Git - created on the 7th of April 2005 by Linus - looked like, and compared it to a recent revision. Although the size of the first implementation was only about 0.2% of its current size, the initial code was already functional.
The interesting question that followed was “Who made today’s Git?” and Junio went through multiple Git queries offering different answers for this.
As an example, to get a commit count sorted by author, excluding merge commits, one can use:
git shortlog --no-merges -n -s v2.4.0-rc0
With the results of each query, Junio gave insights about how we can interpret the results, mentioned the caveats that might apply, and he also took time to thank the people who appeared in these results.
Towards the end of the presentation he also mentioned the people who don’t appear in the results: bug reporters, feature wishers, reviewers and mentors, alternative implementors and porters, trainers and evangelists. He assigned to this very news-letter the huge task of talking about, and thanking them all ;-)
At the Git Merge 2015, Rick Olson, a developer working for GitHub, gave a presentation about Git Large File Storage (Git LFS), a new git extension for managing big files.
On the Git Merge web site, the name of the presentation was “Building a Git Extension with First Principles”, probably because GitHub didn’t want to announce Git LFS in the time before the conference. In fact, it was announced first on The GitHub Blog the day before Rick’s presentation.
Rick started off by explaining the reasons why such an extension was needed, namely that Git “starts to suck with large binary file objects”. For example it takes longer and longer to clone a repo that has more and more of such objects.
Then he told that GitHub did some user experience research with a
diverse team of users having this problem. They also experimented with
existing solutions like git media
and git annex
.
Rick then detailed the solution that was implemented using the Go language, and how it can be used. For example:
$ git lfs init
$ git lfs track "*.zip"
$ git add otherfile.zip
$ git commit -m "add otherfile.zip"
$ git push origin
Uploading somefile.zip
...
Remote configuration, the server side, the Git LFS API and authentication were also covered. And in the end Rick talked about some ideas for improvements.
It’s interesting and encouraging to see how there has been a recent interest by the community to tackle some of Git scaling issues. At Git Merge 2015 John Garcia from Atlassian also presented some research and a prototype tool to handle large binary files.
The tool hasn’t been released yet but showed interesting features like progressive history retention, file locking, abstracted support for “dumb” storage back ends (like sshfs, samba, NFS, Amazon S3 …) and chunking for resumable downloads.
Jeff King, alias Peff, posted a patch series to address speed regressions when accessing the packed-refs file. This lead to discussions about ways to speed up reading lines in a file.
The packed-ref file has been created a long time ago to speed up dealing with refs. The original way to store refs, which is called the “loose ref” format uses one file per ref to store its content and is used by git for newly created refs. But when the number of refs increases, it becomes much faster to have as much information as possible in a single file. That’s the purpose of the packed-ref file.
Peff discovered that one of his own commit that switched from fgets() to strbuf_getwholeline() to read the packed-ref file was in part responsible for a big slowdown.
strbuf_getwholeline() is part of the Git strbuf API that is used for a lot of string related functions. And strbuf_getwholeline() used the getc() function to get each character one by one until the end of each line, like this:
while ((ch = getc(fp)) != EOF) {
...
if (ch == '\n')
break;
}
But it appears that it isn’t very efficient. It is also problematic to use fgets() inside strbuf_getwholeline() as strbuf_getwholeline() is used in some parts of the Git codebase to read lines that can contain the NUL character and fgets() would not read after the NUL. (And yeah working around that is not easy either.)
So Peff came up with the following explanation and solution to the problem:
strbuf_getwholeline calls getc in a tight loop. On modern libc implementations, the stdio code locks the handle for every operation, which means we are paying a significant overhead. We can get around this by locking the handle for the whole loop and using the unlocked variant.
His patch does basically:
+ flockfile(fp);
+ while ((ch = getc_unlocked(fp)) != EOF) {
...
if (ch == '\n')
break;
}
+ funlockfile(fp);
Duy Nguyen suggested instead to avoid any FILE* interface and either mmap the entire file, or read (with buffering) from a file descriptor, as Git already does to read the index-pack file. But Peff said that it would be very inefficient too, and that there are no good NUL safe function to read from a file descriptor.
Junio wondered if it would be worth it to have callers that need to handle NUL characters pass a flag, so that the default implementation would still be fast.
Eventually Rasmus Villemoes suggested using getdelim() when POSIX 2008 is supported and so far this looks like a workable solution.
Anyway it is interesting to see that on the Git mailing list as well as at the Git Merge conference a lot of great developers and companies are working on making Git fast for big repositories.
Paraphrasing Junio, please test it thoroughly so that we can ship a successful v2.4 final at the end of the month without any regressions with respect to v2.3.
More about Git 2.x for Windows release candidates here.
During the really exciting Git Merge conference, the Git for Windows developers had the opportunity to meet and we managed to whip out a really early beta of the upcoming Git for Windows 2.x series. Please keep in mind that we not only changed our entire development environment, but that this change also affects end user installations quite a bit […]
Brendan Forster put together this Beta Testers Guide.
The following fixes have been backported to this maintenance release. […] All users of the library are encouraged to update.
Finally, files (or blobs) can now be searched using the new GitHub-inspired file finder (press ‘f’ to launch it).
GitLab 7.9.3 CE, EE and GitLab CI 7.9.3, April 8th.
Kallithea 0.2, April 10th
Kallithea 0.2 has been released. Kallithea is a GPLv3 source code management software for web-based hosting of Mercurial and Git repositories.
This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Thomas Ferris Nicolaisen <tfnico@gmail.com> and Nicola Paolucci <npaolucci@atlassian.com> with help from Junio Hamano, Emma Jane Hogbin Westby, Andrew Ardill, Rick Olson, Johannes Schindelin and Jeff King.