Git Rev News: Edition 19 (September 14th, 2016)

Welcome to the 19th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of August 2016.

Discussions

General

There were a significant number of Git related presentations at the LinuxCon North America 2016 in Toronto from August 22 to August 24, and they happened to be recorded:

One of the most attended was Josh’s presentation (slides, video) about git-series. Josh had already announced git-series on the mailing list which had generated some amount of discussion about how the different efforts to store more data and meta-data related to patch series using git itself could collaborate.

In his talk Josh started by explaining the problems with the current way of handling a patch series.

One problem is that when you get feedback, you have to rework your patch series, so you create another version of your patch series. But then what happens to the previous version of the series?

You have to keep it, because people can tell you that they liked better what your previous version did, and because some people are actually interested in the real history of your code.

You could use the reflog to keep it, but it is ephemeral by default and it is not easy to push or pull. You could also dig an email from your sent-mail folder or a mailing list archive.

So a fundamental problem is that Git tracks only curated history but we need more than that, we need the history of history.

git submodule could be used to track that but people generally have a bad experience with git submodule. It’s also possible to manage patches outside Git. There are tools like for example quilt that can be used for this purpose, but then you lose the power of working with Git.

Another possibility is to use branches with version names like feature-v1, feature-v2 and so on. But soon you could have names like feature-v8-rebased-on-3-4-alice-fix and then “everybody who worked in a corporate environment would feel right at home”.

Such solutions anyway don’t solve the problem of managing the cover letter and the base commit which is the commit you started your patch series from.

They also don’t solve the problem of collaboration. One rule of collaboration is to never rewrite published history, but then how do you collaborate on history that needs rewriting?

Emailing patches back and forth is not a good solution for some kinds of work like back-porting a feature, preparing a distribution package, rebasing stacks of patches sitting on top of upstream code.

‘git-series’ has been developed to fix all those problems. It tracks the history of a patch series, and also tracks its cover letter and its base.

Then Josh gave a demo.

To create a series called “feature” based on v4.7, you would run for example:

git series start feature
-> HEAD is now detached at fa8410b Linux 4.8-rc3
git checkout v4.7
-> HEAD is now at 523d939... Linux 4.7
git series base v4.7
-> Set patch series base to 523d939 Linux 4.7
vim README
git commit -a -m 'Change A'
vim README
git commit -a -m 'Change B'
git series status
-> On series feature
->
-> Initial series commit
-> Changes not staged for commit:
->   (use "git series add <file>..." to update what will be committed)
->
->         Added:         base
->         Added:         base
->
-> no changes added to commit (use "git series add" or "git series commit -a")
git series add series
git series add base
git series commit
-> [feature 5eca363] Initial version of feature
git series cover
-> Updated cover letter
git series commit -a -m "Add cover letter"

The following commands were also part of Josh’s demo:

Then Josh went back to the presentation to talk about how git-series works.

The internals are described in INTERNALS.md in the git-series repo.

After reviewing the Git objects (blobs, trees, commits, tags) and refs, Josh noticed that trees can refer to commits and such an entry in a tree is called a “gitlink”. Gitlinks are already used by git submodule. git-series uses them to track the series and the base.

One of the requirements for git-series was that every object referenced by git-series has to be reachable by Git, otherwise it might get deleted, and you want to be able to push and pull the objects, but you can do this only if they are reachable.

The way git-series is implemented is that a series is like a branch prefixed with ‘git-series’, for example:

refs/heads/git-series/feature

This branch points to a commit for example called series-v2, that itself has commit series-v1 as its first parent.

The tree pointed to by these commit has the following entries:

The problem with this is that Git by default doesn’t follow gitlinks for reachability or push/pulls.

To fix that, an extra parent commit is added to the series-v1 and series-v2 commits for reachability. git-series ignore that parent when traversing the commits.

Josh then gave more “minor details” about how it works.

Your current branch is referred by HEAD and the current series is referred by refs/SHEAD, in which ‘refs/’ is needed for reachability.

The working and staged states are respectively in:

which both points to temporary commits. This is needed for reachability of a not yet committed cover letter.

Then Josh talked about his experience designing and developing git-series.

He found on multiple occasions that avoiding to need big errors messages was a good strategy. Often a long and complex error messages suggested he might have a design flaw, so he redesigned to make the error impossible.

One example of that is what happens when we detach from a series or check out a new series with uncommitted changes. First he had designed git-series to use only one staged and working version for a repository, so in this case he would have needed an error message to explain that you could lose some data and perhaps something like git series checkout --force to checkout anyway.

Then he realized that if each series had its own working and staged version there would be no need for such an error message and for a force option.

Another example is what happens when you have created a new series and made some change to it, but have not yet committed anything, and you want to detach from it or checkout a new series.

Git has the notion of an “unborn branch”, as, when you create a repo, the “master” is created and HEAD points to it, but “master” doesn’t point to anything. This means many special cases.

Instead of having to write error messages when we detach from a series or when we checkout another one, as soon as you start a series the working and staged versions are created and a message says: “new no commit yet”. So unlike git you can create new series with nothing on them yet.

Josh then explained that git series rebase was interesting to implement because libgit2, which was used to implement git-series, has no support for rebase.

Git saves state when it fails in the middle of a rebase and you have to use git rebase --continue to continue when the problem has been fixed.

So a temporary measure Josh used, while working on implementing rebase in libgit2, is to write out all the necessary state that Git would save if it failed, and then exec git rebase --continue. This way Git resumes a rebase that it never started.

The last things Josh talked about are the tools he used to build git-series. Josh used Rust and libgit2 with its Rust bindings. He highly recommends libgit2 and Rust. He said libgit2 was essential and is really effective to play with a repository.

git-series has been the project he used to learn how to use Rust. As it is still a very young language, he had to submit patches to the libgit2 Rust bindings and to a few other Rust libraries to make them do what he needed. But it was really fun experience especially because he didn’t have to deal with memory management.

Next year the “LinuxCon” will be renamed “Open Source Summit” and in North America it will happen in Los Angeles, September 11-13. Perhaps the name change hints that it could become an even more relevant place for Git related presentations.

There was only one student, Pranit Bauva, mentored by Lars Schneider and Christian Couder, who participated this year in the Google Summer of Code. Matthieu Moy and Jeff King were the GSoC administrators for the Git project.

Pranit has been working on an “Incremental rewrite of git-bisect” which goal was to port parts of the git bisect command from shell to C code.

Pranit wrote a report about his work and uploaded the last version of the patches he wrote before the end of the GSoC.

Pranit passed the GSoC final evaluation.

This year we will see the return of Git User’s Survey (the last one was in 2012). The goal of the survey is mainly to help to understand who is using Git, how and why.

The results will be published to the Git wiki on the GitSurvey2016 page and discussed on the git mailing list.

The survey would be open from 12 September to 20 October 2016.

Beside getting information on how people use Git, what are their pain points, and what they want from Git (which might be different from what Git developers think it is), the survey has also educational purpose, and includes a bit of advertisement (brief note about this was in #17). For example question about tools used or one about features used teaches what is available (and which might had been not known).

The questions were prepared with the help of Eric Wong and Andrew Adrill; the [RFC] Proposed questions for “Git User’s Survey 2016” thread and its sub-threads was posted to solicit feedback.

The survey was announced first on Git mailing list in [ANNOUNCE] Git User’s Survey 2016 thread. At request to survey administrator, Jakub Narębski, (as response to the announcement email), one can get a separate channel in survey, with a separate survey URL, so that responses from a particular site or an organization could be split out. Some companies already got their customized survey URLs.

It should be noted that the result of the survey need to be taken with a bit of caution. As Johannes Schindelin reminded us, many professional developers that use Git would be too busy to take the survey. Though hopefully the fact that you can fill it bit by bit (from the same computer), instead of having to fill it whole at once, would help. Taking 30 minutes or more at once may be a problem, taking 10 times a 3 minutes at once may not be…

There is also alternate version of the survey (alternate channel), which does not require cookies or JavaScript, and is fully anonymous, but it doesn’t allow one to go back to response and edit it: https://tinyurl.com/GitSurvey2016-anon

A bit of history: First “GIT user survey” was announced and created in 2006; its results can be found at GitSurvey2006 page on Git Wiki. So this year’s survey is 10th anniversary of the first one. Since 2008 the Git User’s Survey was hosted on Survs.com (then beta), thanks to generosity of the site, who provides with recurring premium annual plan.

From the beginning in March 2015, Nicola Paolucci helped set up the infrastructure for Git Rev News, especially MailChimp to send Git Rev News as an email to people who subscribed to receive it this way, and a number of miscellaneous improvements.

Unfortunately, as he has been lacking bandwidth to work on Git Rev News recently, he asked to be de-installed from being an editor. So we are now only two, while there is more and more to do to have a relevant, vibrant and entertaining news letter for the Git Community.

That’s why we are thanking Nicola for his help, and looking for other people to join our small Git Rev News editor team.

There is no need to even know Git well for that. It’s possible to participate by just proofreading articles for example. Please contact us (see our emails at the bottom) if you are interested.

Reviews

One of the improvements in the just released Git v2.10 is an optimization of the patch id mechanism implemented by Kevin Willford helped by Johannes Schindelin, alias Dscho, as Kevin and Dscho are colleagues working for Microsoft.

The patch id mechanism is used for example by git rebase to avoid trying to rebase commits that have been already integrated. This is done by computing a finger print of each commit called “patch id” and comparing the patch ids of the commits on the two sides of the rebase.

Kevin started by sending a patch called “Use path comparison for patch ids before the file content”.

The idea behind his patch is that, to compare commits, it should be simpler to first look which files are changed by the commits, before looking at their content. If the files changed by two commits are different, there is no need to look at what is changed exactly to tell that the commits are different.

So instead of computing a patch id made from the content of a commit, it is more efficient to compute a patch id based on the files that are changed and only if necessary compute another patch id based on the content that is changed. This makes git rebase 1% to 6% faster.

Junio Hamano agreed that it was a good idea and suggested some improvements. Dscho also commented and suggested other improvements including to split the patch.

A few weeks later Kevin sent a version 2 in the form of a 4 patch series.

There were some discussions around a few related issues but in the end the patch series got merged.

Developer Spotlight: Brian M. Carlson

I’m a software developer living in Houston, Texas, US. I’ve long been involved in the FLOSS community, mostly with one-off patches, documentation improvements, and other maintenance-related work. I’m a polyglot programmer, and I contribute to Asciidoctor, among other projects. I also enjoy biking and writing in my spare time.

Professionally, I’m a developer with cPanel. There I do maintenance work on our product and development tools, manage alternating releases, advise on security design, and of course maintain the copy of Git we use internally and ship to customers. I also perform some training for developers, QA, and documentation folk on how to use Git and how to use it more effectively.

One of the things I’ve really enjoyed doing is making Git work better over HTTP with Kerberos. I use Kerberos for my internal network infrastructure at home, and Git has gone from not working at all to being very robust and reliable using Kerberos authentication. There are still some usability issues we can improve on, though.

Most of my time is spent on converting the code to use a structure for object identifiers (hashes) instead of one-off arrays everywhere. This improves the quality and maintainability of the code, and it also makes it easier to switch to a different hash than SHA-1. Depending on the hash that’s chosen, we could also potentially improve the performance of Git as a result of that work as well.

One thing I know is a pain point for a lot of people is very large repositories, whether that’s in terms of number or size of files, total repository size, or number of references. I’ve personally had to troubleshoot some of those problems, and I’d love to get some additional improvements in that area to make Git perform better and use less space.

I think I’d remove the dumb HTTP and FTP support. It tends to perform really badly, requires per-repository server-side configuration to get right, and complicates the smart HTTP protocol code.

To be honest, it’s zsh’s built-in vcs_info functionality. It automatically detects what branch I’m on, if I’m in the middle of an operation like a rebase or cherry-pick, and whether changes are staged or not, as well as providing pretty good completion. It works with a wide variety of Git versions and of course has the scope of features you’d expect from zsh.

Releases

Other News

Git tips and tricks

Light reading

Git tools and sites

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com> and Thomas Ferris Nicolaisen <tfnico@gmail.com>, with help from Brian M. Carlson, Jakub Narębski, Lars Schneider and Josh Triplett.