Git Rev News: Edition 17 (July 20th, 2016)

Welcome to the 17th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of June 2016.

Discussions

Reviews

Last January David Greene, who maintains git-subtree.sh, sent a patch series to remove the --annotate option from git subtree and then a version 2 of this patch series.

This came after previous work to add --unannotate some years ago in 2012 and in 2013.

The reason why adding --unannotate has not been pursued is that it is “difficult to define due to the numerous ways one might want to specify how to edit commit messages”. And --annotate is now considered not well suited to rewriting commit messages compared to other existing tools like filter-branch, rebase -i and commit --amend that can be used afterwards.

Junio replied that the above is usually not a good enough reason to remove a feature unless it can be shown that nobody is using it.

David then explained that he doesn’t know how much --annotate is used, but that he is willing to first deprecate it and then after some time remove it.

He also explained that he is also working on a few other things “that will involve slight semantic changes” and that he has a plan to move git subtree out of the contrib subdirectory of the Git source tree where it is currently and into the main area where there are all the non contrib Git commands.

This change would possibly in the end move some of the maintenance burden of git subtree from David to all the Git developers and ultimately Junio, but David said that the changes he was planning would remove some maintenance burden.

Junio agreed to the removal of --annotate and in another mail detailed the historical purpose of the contrib area and what is expected from code in this area:

The contrib/ area was created back when Git was still young and we felt that it would be beneficial for building the community if contributions to non-core part were also included, encouraging developers whose strength are not necessarily in the core part to participate in various design-level discussions to grow the community faster. Back then, we felt that an obscure standalone project outside Git that would help the Git-life of users have a much better chance of surviving (and eventually be polished) if we had them bundled, even if the code quality and stability were sub-par.

Those young days are long gone. A standalone tool that aims to help users’ Git-life would not just survive but flourish with much more certainty, as long as the tool is good. We have enough Git users to rely on words-of-mouth these days to ensure their success.

That is why I am very hesitant to add new things to contrib/ these days. It is very welcome thought that you are working on improving subtree, and eventually moving it out of contrib/. From the point of view of the project, either moving up (to be part of the git core proper) or moving out (to become an independent project) is far more preferreable than the status quo so far that was staying in contrib/ (without seeing much changes and slowly but steadily bitrotting).

If the aspiration is to move up to exit, then the quality and stability expectation is basically the same as stuff in core, and we need to strive to keep it stable and high quality.

Recently David replied to the above:

This is the strategy I was planning to pursue. After extensive experience with git-subtree and some local enhancements I have in real-world work, I am convinced it is a great complementary tool to git-submodule. It seems odd to me to have one in core and one not.

And David also detailled some of the work he plans to do on git subtree.

Support

Ovatta Bianca asked:

I read in every comparison of git vs other version control systems, that git does not record differences but takes “snapshots” I was wondering what a “snapshot” is? Does git store at every commit the entire files which have been modified even if only a few bytes were changed out of a large file?

Philip Oakley answered:

A snaphot is like a tar or zip of all your tracked files. This means it is easier to determine (compared to lots of diffs) the current content.

Keeping all the snapshots as separate loose items, when the majority of their content is unchanged would be very inefficient, so git then uses, at the right time, an efficient (and obviously lossless) compression technique to ‘zip’ all the snapshots together so that the final repository is ‘packed’. The overall effect is a very efficient storage scheme.

and pointed to some documentation on Git’s web site for “some good explanations”.

Jeff King alias Peff explained how the relationship between Git objects already makes things efficient:

Each commit is a snapshot in that it points to the sha1 of the root tree, which points to the sha1 of other trees and blobs. And following that chain gives you the whole state of the tree, without having to care about other commits.

And if the next commit changes only a few files, the sha1 for all the other files will remain unchanged, and git does not need to store them again. So already, before any explicit compression has happened, we get de-duplication of identical content from commit to commit, at the file and tree level.

And then when a file does change, we store the whole new version, then delta compress it during “git gc”, etc, as you describe.

And Jakub Narębski detailed the “loose” and the “packed” format.

Developer Spotlight: Jakub Narębski

I’m an occasional contributor to Git, and an unofficial gitweb maintainer; a physicist turned to compute science. One of first programs that I wrote was a computer simulation. Currently I am working at the Nicolaus Copernicus University in Toruń. There, among other things, I teach Git to students, as a part of their coursework.

I have created, announced and analysed annual Git User’s Surveys from 2007 till 2012 (all except the first one). You can find their results on the Git Wiki. This year I plan on restarting the survey.

I am also the author of the Mastering Git book published by Packt.

The goal of the “Mastering Git” book is to help readers get an expert-level proficiency with modern Git. I wanted to pass the information about the advanced use of Git, pass my knowledge about it, and improve their understanding of Git behavior. The idea was to show useful features (like for example the git stash command) together with explanation on how they work, to attain a deeper understanding, allowing Git users to be able to create their own solutions for their problems (like for example extracting file changes from the stash), based on this understanding of Git, instead of having to rely on ready recipes.

This book would be not created without Packt. They have found me thanks to my contributions on StackOverflow and asked for authoring the book on Git targeting advanced usage, to follow theirs Git: Version Control for Everyone. They were very helpful; this was my first such big work.

I have followed Git development from the very beginning of its creation, on then existing and now defunct KernelTrap and Kernel Traffic sites (and the only one existing issue of Git Traffic). From there I have moved to looking on the Git mailing list. Git was so much easier to use and understand than CVS (and RCS) that I was using then for version control: easy branching, switching branches, merging, checking out older versions (remember sticky tags and dates?), atomic commits, peer-to-peer workflows,…

My very first contribution to Git was an update to gitweb’s README in 2006… and that’s how I came to be, a bit of time later, an unofficial gitweb maintainer ;-)

Certainly my biggest contribution in terms of lines of code, number of commits and number of patch reviews was my refactoring of gitweb, making it easier to develop and maintain, while still providing a simple install path; and providing it with a documentation (gitweb(1), gitweb.conf(5)). An important to me contribution was adding a configure script for automatic build configuration.

In terms of impact on Git’s user friendliness and usability were probably (the few) improvements to the documentation.

Recently, I’ve not been contributing much to Git. I have now returned to the Git mailing list after a long hiatus. I am currently working and construction and then doing Git User’s Survey 2016. The survey is preliminary planned for the month of September. Therefore if you want to point it into specific direction, give it certain focus, or include a particular question, now it’s time to speak. I think it is important avenue to hear the voice of Git users, to help make Git better for all various use cases. Also it serves as a nice way to advertise Git capabilities…

I keep reviewing gitweb patches. My TODO list for gitweb is quite long; I hope to shorten it some.

Improving further performance on big repositories would be nice (large number of file, large binary files, long history, large amount of branches, tags, replacements and notes). I’d like for it to go through and borrow ideas from other Git implementations and from other version control systems. The addition of compressed bitmap indices first to JGit, then to core-git to speed up cloning shows that there might be good ideas on how to speed up reachability and least common ancestor (also known as merge base in Git) queries in computer science papers.

One hard problem in Git that would probably need such team of experts for a full year is resumable clone / resumable fetching. It is something that people want to have, but it turns out it is something really hard to implement reasonably. It can be worked around by using git bundles, which hopefully be automated and standarized; but it is still a workaround, not a solution.

I would make remote-tracking refs fully qualified, that is use for example refs/remotes/<remote>/heads/<branch>. This would make it easier to fetch remote-local tags, to fetch replacements, notes, stashes, etc.

I would also redo and redesign user interface of Git commands. The bottom-up, “worse is better” approach creates superior features, but it all too often results in inconsistent and inferior UI. It would be good to have consistent rules for using commands, subcommands and options. Currently it is a bit of historical mess; some features use command options, some subcommands. Some commands are narrow in scope, some have many different (and weakly related) modes of operation.

My favorite features are (1) the explicit staging area, which allows disentangling changes to be in the next commit from the state of the working directory, and which allows splitting commits with an interactive rebase, and (2) reflogs, which saved me from my mistakes in handling Git many, many times.

Some time ago I have been using intensively a patch management tool named StGit (Stacked Git). Nowadays I use interactive rebase for nearly the same purpose, that is cleaning up a series of changes before sending the new version upstream for the review. It is a bit more cumbersome to use, but interactive rebase is a built in feature.

I work mostly with git core tools (with the Git itself), sometimes using IDE integration with Git, or a graphical commit tool for easier interactive add. As an administrator, I love Gitolite — it allows easy creation of repositories and repository access management, without the need to bother sysadmins

One tool that looks interesting, and which I would like to try out, but didn’t have an occasion to use, is the git-imerge.

Releases

Other News

Various

Light reading

Git tools and sites

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Thomas Ferris Nicolaisen <tfnico@gmail.com> and Nicola Paolucci <npaolucci@atlassian.com>, with help from Jakub Narębski and Johannes Schindelin.