Welcome to the 48th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the month of January 2019. It also covers the Git Contributor Summit and the Git Merge conference that took place on January 31th and February 1st.
Git Merge 2019 — General Sessions
The Git Merge 2019 conference took place in Brussels, Belgium on January 31st (workshops and contributor summit) and February 1st (main conference day).
This year a big theme was handling large Git repositories, both from technical and organizational point of view
Ivan Frade and Minh Thai in “Tales in scalability: how Google has seen users break Git” talked about solving problems with Android (many repos, huge binary assets, many commits) and Chromium monorepo (many unique committers). Some of the problems were caused by legacy practices of trying to keep Subversion-like monotonic version number – it turned out that attempts to provide it got into troubles and were cause of much of churn. Another problem was the change in Gerrit, which now stores patch history in git repo, resulting in “forest of tiny bushes” graph of commits; the solution here was moving to protocol v2. There was also talk about making the negotiation phase during fetch faster at the cost of somewhat bigger data transfer, e.g. by skipping commits using Fibonacci number gaps.
Johan Abildskov, a consultant at Praqma in “The what, how and why of scaling repositories” talked about how to choose between monorepos and many-repos (and how to split the codebase into repositories). The major idea was to not ignore the real problems (like having to create multiple commits to handle single bug), and to base decision on data
Our conclusions are not better than our data
For this reason the git-metrics tool was created, which is a set of util scripts to scrape data from git repositories to help teams improve.
Brandon Williams from Facebook gave a lightning talk “Git protocols: still tinkering after all these years?” focusing on introduction of protocol v2 to reduce communication overhead (especially important for repositories with large number of branches and tags) and increase extensibility, and troubles with adding it while maintaining all-important backwards compatibility.
Terry Parker from Google gave a lightning talk “Native Git support for large objects”
explain how Git’s new partial clone feature (where only a subset of objects,
selected by initial filter, e.g. --filter=blob:limit=1m
, is downloaded on clone;
the rest are fetched on demand, as needed) and the new proposal to use content distribution networks
(CDN) can help with handling repositories with large files.
John Briggs from Microsoft in “Technical contributions towards scaling for Windows”
talked about both technical improvements in Git, like serialized commit graph (with
generation numbers) and multipack index (*.midx
), and the “sparse” object walk
during push that is being worked on (see the “Reviews” section), and
improvements in VFS for Git (formerly called GVFS), like prefetching in background
and git status serialization. He also announced that VFS for Git will be ported
to other platforms: MacOS and Linux (to handle MS Office, which itself is cross-platform
project).
John Austin, game studio technical lead from A Stranger Gravity and Funomena in “Git for games: current problems and solutions” talked about major problem with using Git in game development workflows, namely many and large binary files, for which file conflicts are lost work (minor change, like adding voiceover or changing equalizer settings results in large changes to files). File locking is one possibility, but it doesn’t play nicely with Git – it is inherently centralized. He introduces a new tool, Git Global Graph (a work in progress), which can be used to check at commit time if it wouldn’t create a divergent version of a file. The idea is that there should be only a single path through commit graph with changes to binary files.
Javier Fontan from source{d} gave a lightning talk “Gitbase, SQL interface to Git repositories” about gitbase tool, which provides read-only SQL interface to Git repositories (with Abstract Syntax Tree support).
Brian M. Carlson, Git Ecosystem Engineer at GitHub in “Bridging the gap: transitioning Git to SHA-256” talked about ongoing work to transition from SHA-1, which is considered weak, to SHA-256, which is more secure: the transition plan, where we are with it, and how to provide interoperability between versions of Git using different hash algorithms.
Belén Barros Pena, PhD student and interaction designer, gave talk “The art of patience: why you should bother teaching Git to designers”, where she also described how to do it and provide good retention, namely:
Veronica Hanus in “Version control for visual learners” talked about how to enter visual representations of recently-changed elements into version control in the form of screenshot diffing.
Last November Derrick Stolee, who prefers to be called just Stolee,
sent a patch series to the mailing list to speed up git push
operations by implementing and using a new “sparse” tree walk
algorithm.
Stefan Beller wondered how users can know about this new algorithm and if it should be turned on by default for users. Stolee replied that indeed “we should actually make the config setting true by default, and recommend that servers opt-out”.
Junio Hamano, the Git maintainer, disagreed saying that we should wait until “enough users complain that they have to turn it on” before we turn it on by default.
Stolee later sent a version 2 of the patch series improving the tests, then a version 3 improving the documentation, and a version 4 with a few code and commit message improvements.
Junio and Stolee discussed how the mark_trees_uninteresting_sparse()
function is implemented in the first patch, and how a variable is
named in this function.
They also discussed the purpose of patches 2 and 3 and agreed that they should be merged and what the related tests should do.
Additionally, Junio suggested a number of small code improvements in the last
patch. Especially he suggested to get rid of a global variable that
was unused. Ramsay Jones, who regularly uses the sparse
tool and his
own static-check.pl
script on the Git code base to find errors, had
also found this unused variable separately.
Ævar Arnfjörð Bjarmason chimed in to ask for a clarification about which step the patch speeds up, and if a progress bar should be added while the user is waiting during this step, and how this step should be named on the command line interface. It seems though that some preliminary work would be needed to untangle the steps during which a progress bar is already displayed.
Stolee eventually sent a version 5 of the patch series on January 16th which has since been merged and is in the recently released Git v2.21.0.
Various
GSoC 2019: Git’s application submitted and got accepted as one of 207 open source projects; ideas for project proposals published
The Git Contributor Summit 2019 happened on January 30th in Brussels. Elijah Newren took some notes. A video stream of the event was broadcast and recorded, but is not yet available for download.
Light reading
gitgeist: a git-based social network proof of concept by Karim Yaghmour (mentioned on LWN.net).
France enters the Matrix [LWN.net] by Tom Yates covers Matthew Hodgson talk about Matrix at FOSDEM 2019; Matrix is an open standard and lightweight protocol for real-time communication, allowing to create decentralized federated instant messaging system with end-to-end encryption; the video of the whole talk is available.
[…] the “first-class citizen” in Matrix is not the message, but the conversation history of the room. That history is stored in a big data structure that is replicated across a number of participants; in that respect, said Hodgson, Matrix is more like Git than XMPP, SIP, IRC, or many other traditional communication protocols.
./hacker-tools lectures: Version Control, focusing on Git (article with an embedded 53 minutes video).
An open source parser for GitHub Actions on GitHub Engineering Blog (GitHub Actions were covered in Git Rev News #44).
Snowpatch: continuous-integration testing for the kernel [LWN.net] by Jonathan Corbet. Snowpatch (mentioned in Git Rev News Edition #40) is built on top of patchwork (mentioned in Git Rev News Edition #20).
Git tools and sites
git-history is a web-based tool (for Node.js) to quickly browse the history of a file from any GitHub repository. (GitLab and Bitbucket support is also planned); unfortunately the demo service at https://githistory.xyz/ was down at the time of publishing this edition, causing Chrome and Firefox extensions, which add an Open in Git History button to GitHub, not to work either.
gitgeist-poc by Francois-Denis Gonthier is a Proof-of-Concept implementation of gitgeist: a git-based social network proof of concept idea.
Git Gud is a pretty barebone visual web-based Git simulator, meant to help understand Git better, which got announced by its author Nic Hartley in Git Gud at git; quite similar to Learn Git Branching service (covered in Git Rev News Edition #30).
GitLens — Git supercharged extension supercharges the Git capabilities built into Visual Studio Code.
github-spray is yet another tool to draw on your GitHub contribution graph; there is also GitHub Spray Generator service.
gitbase from source{d} is a tool (in alpha) providing SQL interface to Git repositories, written in Go. It is part of source{d} Engine, and implements MySQL wire protocol. Uses go-git for accessing Git repositories, go-mysql-server for the SQL engine implementation, enry for programming language detection for files, and bblfhs for source code parsing into AST (Abstract Syntax Tree). There is also a web client for it.
Gitana: a SQL-based Project Activity Inspector, written in Python (GitHub repository), was mentioned in Git Rev News Edition #7. Nowadays it imports and digests the data of Git repositories, issue trackers (including Bugzilla and GitHub), Q&A web-sites (including forums and StackOverflow) and instant messaging services to a relational database in order to ease browsing and querying activities with standard SQL syntax and tools.
This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Gabriel Alcaras <gabriel.alcaras@telecom-paristech.fr> with help from David Pursehouse and Luca Milanesio.