Git Rev News: Edition 18 (August 17th, 2016)

Welcome to the 18th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of July 2016.

Discussions

Reviews

Lars Schneider recently sent version 5 of his “Git filter protocol” patch series. The goal of this series is to avoid launching a new clean/smudge filter process for each file that should be filtered.

Only one filter process per Git command should be launched, and this process should communicate with the Git command using Lars’ new filter protocol.

This would make Git faster when a large number of files have to be filtered and when the startup time of a filter process is not insignificant.

Lars wants especially to speed up Git-LFS, as Git-LFS works using a clean/smudge filter to send or get the large files to and from the special Git-LFS storage, so he also wrote a pull request that implements a filter process for Git-LFS and that uses his new filter protocol.

On this pull request, Lars reports the following results when switching branches on OSX with 12,000 Git LFS files:

Default Git and Git LFS:                      6m2.979s + 0m1.310s = 364s
Git and Git LFS with filter protocol support: 0m2.528s + 0m2.280s = 5s

He says that with his filter protocol the operation is almost 70 times faster in this particular use case and that he expects “even more dramatic results on Windows”, as launching a new process is usually slower on Windows.

When he started working on this, Lars first sent emails to the mailing list to get information about filter driver code and explanations about the fact that clean filter is executed 12 times for 3 files.

The discussion following his first email involved Junio Hamano, Jeff King alias Peff, Torsten Bögershausen and Jakub Narębski, and led to explanations and then interesting design discussions.

The discussion following Lars’ second email prompted Peff to send a patch to fix some useless clean filter invocations.

Following those discussions Lars sent the following versions of his patch series:

These series were reviewed or involved a large number of Git developers, like Ramsay Jones, Remi Galan Alfonso, Eric Wong, Duy Nguyen, Johannes Sixt, Stefan Beller, Junio, Peff, Torsten, Jakub.

One especially interesting sub thread was started by Jakub with a long email about “Designing the filter process protocol”.

Hopefully all this work will eventually be merged and result in great improvements for some important Git use cases.

Support

Duy asked on the mailing list:

Before I start doing anything silly because I don’t know it can already be done without waving my C wand like a mad man…

I often do this: find a commit of interest, the commit itself is not enough so I need a full patch series to figure out what’s going, so I fire up “git log –graph –oneline” and manually search that commit and trace back to the merge point, then I can “git log –patch”. Is there an automatic way to accomplish that? Something like “git branch –contains” (or “git merge –contains”)?

PS. Sometimes I wish we could optionally save cover letter in the merge commit. Sometimes the “big plan” is hard to see by reading individual commit messages.

Saving the cover letter of a patch series - which is patch 0 in the series, but is not a real patch, so is not applied - is by the way a different topic that reappeared on the list recently and was also discussed following the announcement by Josh Triplett of his new git-series tool.

To the main question about finding the topic branch containing a commit, Stefan Beller suggested using Michael Haggerty’s git-when-merged.

Duy was happy with this tool, but would have liked an option to show all the commits in a topic branch, for example something that would do a git log from the merge base to the merge point. He also asked Michael if he had any plan to port it to C and integrate it into Git.

Michael replied the next day that he had made a pull request, which has since be merged, for the new option, but that he had no plan to port it to C and integrate it into Git.

Junio also suggested a way to get a more human readable result for example by running git show on the merge commit.

Developer Spotlight: Lars Schneider

I am a software engineer living in Berlin, Germany. Currently, I am the technical lead for a team that helps 4000+ Autodesk engineers adopt Git as their main source control system. This is a challenging but also very interesting task as I am constantly confronted with all kinds of problems that Git users run into. Fortunately, Autodesk allows me to spend part of my time addressing these problems and contributing possible solutions back to the community.

Autodesk has lots and lots of Perforce repositories with 20+ years of history. We are gradually moving them to Git and during this process I try to constantly improve the “git-p4” migration tool. I managed to get quite a number of patches upstream already but there are still more in my local queue :-)

However, I am most proud of an indirect contribution to Git. I helped the Git community set up Git CI builds for OSX and Linux on Travis CI. This makes it really easy to ensure that new patches build without errors and cause no test failures on all major platforms and compilers. Casual contributors can create Pull Requests containing their patches against https://github.com/git/git and 20min later they would know if their patches pass all checks. This way a contributor can ensure that no precious reviewer time is wasted with broken patches.

I am working on an improved Git filter protocol. Git filters are a great mechanism to transparently modify repository content on commit and checkout. Amongst other things it is used to adjust platform specific line endings, to cleanup tab/whitespaces issues, to encrypt content, and to handle large files outside of the Git repository (e.g. via git-annex or Git LFS).

The problem with the current protocol is that a filter process is invoked for each individual file. If you have a large number of files, then you start an equally large number of processes. This gets slow quickly and therefore I am working on a patch series that reuses a single filter process for all files in the lifetime of a Git process.

First, we would improve the Travis CI setup and add Windows to the platforms that are constantly tested. Afterwards we would join forces with David Turner, Ben Peart, and Duy Nguyen and improve the Git performance for repositories that contain a very large number of files.

I would not remove anything, but I would try to improve the UX. Unfortunately, modifying the UX of Git core commands is incredible hard as these commands are also used in a lot of scripts that we cannot break.

A couple of years ago I wrote ShowInGitHub, a plugin that helps you to jump from a specific line of code in Xcode to the same line on Github. It was my favorite Git tool during my iOS developer days.

More recently I got really excited about Git LFS. Granted, Git LFS breaks one of Git’s core features as LFS repositories are by default not distributed anymore. However, Git LFS is a pragmatic solution to a huge problem that many Git users face if they need to store large media assets or integration test data along with their source code.

Releases

Other News

Various

Light reading

Git tools and sites

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com> and Thomas Ferris Nicolaisen <tfnico@gmail.com>, with help from Lars Schneider, Johannes Schindelin, Roberto Tyley, Jakub Narębski and Josh Triplett.