Git Rev News: Edition 13 (March 16th, 2016)

Welcome to the 13th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of February 2016.

Discussions

General

It is a widely known problem that git clone is not resumable. If the connection comes down during a clone, the clone has to be restarted from scratch.

A work around that is often suggested and used is to first create a bundle using git bundle, to rsync that bundle, then to clone from the rsync’ed bundle and eventually to fetch what is missing from the remote git repository. Some tools like gitolite or the “repo” tool that is used by AOSP and some websites like kernel.org have even been making it easier to support that.

There was also at one point in 2011 a patch series to improve the support of this kind of clone workflow internally.

And for some time this was thought of by some Git developers as just a small manpower problem. A few months of dedicated work by anyone could probably fix that. It was even proposed as a Google Summer of Code (GSoC) project.

Over time though the community realized that it was not so easy because some very careful design was needed, and it was removed from the list of possible GSoC projects.

So it was very exciting to see a number of new proposals pop up on the list during the last few months.

It started on February 5 with a “Resumable clone revisited, proof of concept” patch series by Duy Nguyen where he wrote:

I was reminded by LWN about this. Annoyed in fact because it’s called a bug while it looks more like an elephant.

and pointed to a LWN.net article that reports about Sarah Sharp speaking at the SCALE 14x conference: “she noted that Git still does not support interrupting and resuming download operations, which is an important bug to fix.”

Then on February 10 Shawn Pearce sent an ‘RFC: Resumable clone based on hybrid “smart” and “dumb” HTTP’ proposal that he had discussed internally with other people at Google where he works.

This was followed on March 2 by an email called “Resumable git clone?” from Josh Triplett, a well known Linux Kernel developer, who asked:

In a discussion elsewhere, Al Viro suggested taking the partial pack received so far, repairing any truncation, indexing the objects it contains, and then re-running clone and not having to fetch those objects. This may also require extending receive-pack’s protocol for determining objects the recipient already has, as the partial pack may not have a consistent set of reachable objects.

Before starting down the path of developing patches for this, does the approach seem potentially reasonable?

Josh talks about Al Viro who is another well known Linux Kernel developer, and it’s interesting to see Linux Kernel developers interested again in taking part in Git development. It reminds some old timers about the “good old time”.

All these proposals have been discussed by many regular Git developers and reviewers like Stefan Beller, Junio Hamano, Johannes Schindelin, Jonathan Nieder, Eric Sunshine, Jeff King, Elia Pinto.

About Shawn’s proposal Blake Burkhart reminded the community that the implementation has to keep in mind that it would introduce potential security issue if Shawn’s proposal is done carelessly. And other people like Bhavik Bavishi and Konstantin Ryabitsev also took part in the discussion following Josh’s email.

From the last discussions about Josh’s email, it appeared that Git developers favored Shawn’s proposal over others, though Shawn’s proposal could benefit from implementing parts of Al’s and Josh’s proposal too. So the plan seemed to be that Shawn’s proposal was going to be worked on soon, and then later some optimizations from Al and Josh could be implemented on top of it.

Then on March 5 Kevin Wern sent an email called “Resumable clone”, where he said he began looking at relevant code to start working on it, and he asked:

Is someone working on this currently? Are there any things I should know moving forward? Is there a certain way I should break down/organize the feature when writing patches?

Duy answered that “Resumable clone is happening.” And pointed to some preparation work by Junio Hamano going on. Junio by the way answered with a very long email that contains “a rough and still slushy outline” of what remains to be done. This was then discussed and explained further.

It is not clear if Shawn’s proposal and Josh’s email were inspired by Sarah Sharp’s remark, and LWN.net’s report about it, but anyway it looks like hopefully this old and annoying problem is going to be fixed not too far away into the future.

Reviews

For some time Lars Schneider has been sending versions of a short patch series to make it possible to see where a config option comes from (v1, v2, v3, v4, v5, v6):

$ git config --list --show-origin
file:/Users/john/.gitconfig user.email=john@doe.com
file:/Users/john/.gitconfig alias.co=checkout
file:.git/config    remote.origin.url=https://repos/myrepo.git

Lars started this patch series with an RFC whereupon Jeff King pointed him to a previous discussion about the same idea. Jeff also posted his initial implementation which Lars’ v1 was based on.

The new feature can be useful because config options can be set in different locations and sometimes it is hard to find where a specific config was defined. Usually a config is defined in one of the following files:

The exact paths of the above files depend on how git was compiled and on the values of at least the GIT_CONFIG, GIT_CONFIG_NOSYSTEM, GIT_DIR and XDG_CONFIG_HOME environment variables.

In addition config values can be set on the command line (git -c <key>=<value> config ...), from another file (git config -f <file> ...), from standard input (git config < ...), or even from a blob (git config --blob=a9d9f9). A config file can also include another config file by using the “include.path = " directive.

Although the implementation itself was straightforward many details around the naming required a thoughtful discussion by Sebastian Schuberth, Jeff King, Ramsay Jones, Mike Rappazzo and Junio Hamano. Eventually the list agreed on the config option name ‘–show-origin’ and the prefixes ‘file’, ‘command line’, ‘standard input’ and ‘blob’ for the different config types.

The git config option also has a number of different modes (--get, --list, …) and it was discussed which of them should be supported by ‘–show-origin’.

Many details in the code and tests where also discussed by Eric Sunshine, Johannes Schindelin, Johannes Sixt, Jeff, Ramsay and Junio.

One nice side effect of this patch series is that in case of a config error Git can now tell more precisely from what type of config the error originates from (e.g. standard input or file).

Jeff King noted in a review that git style doesn’t allow declaration-after-statement. Thereupon Lars Schneider posted a patch to check for this warning in the TravisCI build. In the review of this patch Jeff suggested to codify the knowledge about the warnings into an optional Makefile knob called “DEVELOPER”. Lars combined the warnings that Junio and Jeff care about and posted a revised version of the patch.

Git developers with a reasonably modern compiler can now compile Git with DEVELOPER=1 make or set the flag once for all make executions with echo DEVELOPER=1 >>config.mak to ensure their patches are clear of all compiler warnings that the Git project cares about.

Developer Spotlight: Sebastian Schuberth

I’m a passionate software engineer with a wide range of interests and focus on quality, but I’m particularly interested in cross-platform development. As a techie I’m always trying to look over the rim of a tea cup to learn something new. Since a while, I got more and more interested in taking build automation to the extreme and helping other developers to get the most out of Git, their other tools, and CI. Also, I consider myself sort of a Git Evangelist, promoting the use of Git and teaching it to people where ever I can.

My contributions so far have mostly revolved around running Git on Windows, which is why only small portions of my work are visible upstream. I guess my most important contribution so far is the Git for Windows installer, which I started about 7 years ago. It gave a face to Git on Windows and lowered the hurdle for Windows developers to give Git a try.

Recently, I’ve not been contributing much to Git, neither to upstream nor the Windows port. This is mostly due to time constraints, my choice of programming languages I (currently) like to work with, and also personal dissensions. Instead, I focus on other tools in the Git ecosystem, like JGit and Gerrit.

Wow, that’s a very temping idea :-) There are many small nuisances that deserve to be addressed, but if I was to name a single big topic, I’d say Git should be rewritten as a library, probably by just using libgit2 and making the CLI a thin wrapper around it. As a side effect this would mean to implement all of Git in C, and not use any Shell / Perl / Python scripts anymore, which both improves performance and portability.

There’s not one big thing that comes to my mind, but I believe a general clean-up of legacy code, deprecated command line options and wording in the docs could help :-)

If I think about which Git-related tool has added the most value to code I work on in terms of code quality, that would certainly be Gerrit. It’s UI is not the nicest, but gitk / git gui users will hardly notice ;-) And even in the GitHub-times Gerrit is vastly superior in the information it can display, like diffs between different iterations of patches.

Releases

The last month we saw some maintenance releases of Git, along with some RCs of the upcoming 2.8:

And then there was a significant, albeit humbly versioned new libgit2, which dominoed through its wrapper projects:

Other releases:

Other News

Various

Light reading

Git tools and sites

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Thomas Ferris Nicolaisen <tfnico@gmail.com> and Nicola Paolucci <npaolucci@atlassian.com>, with help from Lars Schneider, Sebastian Schuberth, Junio Hamano and Josh Triplett.