Git Rev News Edition 48 (February 27th, 2019)

Git Rev News: Edition 48 (February 27th, 2019)

Welcome to the 48th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of January 2019. It also covers the Git Contributor Summit and the Git Merge conference that took place on January 31th and February 1st.

Discussions

General

Git Merge 2019 — General Sessions

The Git Merge 2019 conference took place in Brussels, Belgium on January 31st (workshops and contributor summit) and February 1st (main conference day).
- This year a big theme was handling large Git repositories, both from technical and organizational point of view
  - Ivan Frade and Minh Thai in “Tales in scalability: how Google has seen users break Git” talked about solving problems with Android (many repos, huge binary assets, many commits) and Chromium monorepo (many unique committers). Some of the problems were caused by legacy practices of trying to keep Subversion-like monotonic version number – it turned out that attempts to provide it got into troubles and were cause of much of churn. Another problem was the change in Gerrit, which now stores patch history in git repo, resulting in “forest of tiny bushes” graph of commits; the solution here was moving to protocol v2. There was also talk about making the negotiation phase during fetch faster at the cost of somewhat bigger data transfer, e.g. by skipping commits using Fibonacci number gaps.
  - Johan Abildskov, a consultant at Praqma in “The what, how and why of scaling repositories” talked about how to choose between monorepos and many-repos (and how to split the codebase into repositories). The major idea was to not ignore the real problems (like having to create multiple commits to handle single bug), and to base decision on data
    
    Our conclusions are not better than our data
    
    For this reason the git-metrics tool was created, which is a set of util scripts to scrape data from git repositories to help teams improve.
  - Brandon Williams from Facebook gave a lightning talk “Git protocols: still tinkering after all these years?” focusing on introduction of protocol v2 to reduce communication overhead (especially important for repositories with large number of branches and tags) and increase extensibility, and troubles with adding it while maintaining all-important backwards compatibility.
  - Terry Parker from Google gave a lightning talk “Native Git support for large objects” explain how Git’s new partial clone feature (where only a subset of objects, selected by initial filter, e.g. --filter=blob:limit=1m, is downloaded on clone; the rest are fetched on demand, as needed) and the new proposal to use content distribution networks (CDN) can help with handling repositories with large files.
  - John Briggs from Microsoft in “Technical contributions towards scaling for Windows” talked about both technical improvements in Git, like serialized commit graph (with generation numbers) and multipack index (*.midx), and the “sparse” object walk during push that is being worked on (see the “Reviews” section), and improvements in VFS for Git (formerly called GVFS), like prefetching in background and git status serialization. He also announced that VFS for Git will be ported to other platforms: MacOS and Linux (to handle MS Office, which itself is cross-platform project).
- John Austin, game studio technical lead from A Stranger Gravity and Funomena in “Git for games: current problems and solutions” talked about major problem with using Git in game development workflows, namely many and large binary files, for which file conflicts are lost work (minor change, like adding voiceover or changing equalizer settings results in large changes to files). File locking is one possibility, but it doesn’t play nicely with Git – it is inherently centralized. He introduces a new tool, Git Global Graph (a work in progress), which can be used to check at commit time if it wouldn’t create a divergent version of a file. The idea is that there should be only a single path through commit graph with changes to binary files.
- Javier Fontan from source{d} gave a lightning talk “Gitbase, SQL interface to Git repositories” about gitbase tool, which provides read-only SQL interface to Git repositories (with Abstract Syntax Tree support).
- Brian M. Carlson, Git Ecosystem Engineer at GitHub in “Bridging the gap: transitioning Git to SHA-256” talked about ongoing work to transition from SHA-1, which is considered weak, to SHA-256, which is more secure: the transition plan, where we are with it, and how to provide interoperability between versions of Git using different hash algorithms.
- Belén Barros Pena, PhD student and interaction designer, gave talk “The art of patience: why you should bother teaching Git to designers”, where she also described how to do it and provide good retention, namely:
  1. Show things on a need-to-know basis
  2. Avoid the Git jargon
  3. Don’t bother too much with the concepts; will be grasped through practice
  4. Do things with, never for, your designer
  5. Designer should take notes and keep cheat sheet
  6. Teach command–line Git
- Veronica Hanus in “Version control for visual learners” talked about how to enter visual representations of recently-changed elements into version control in the form of screenshot diffing.

Reviews

Add a new “sparse” tree walk algorithm

Last November Derrick Stolee, who prefers to be called just Stolee, sent a patch series to the mailing list to speed up git push operations by implementing and using a new “sparse” tree walk algorithm.

Stefan Beller wondered how users can know about this new algorithm and if it should be turned on by default for users. Stolee replied that indeed “we should actually make the config setting true by default, and recommend that servers opt-out”.

Junio Hamano, the Git maintainer, disagreed saying that we should wait until “enough users complain that they have to turn it on” before we turn it on by default.

Stolee later sent a version 2 of the patch series improving the tests, then a version 3 improving the documentation, and a version 4 with a few code and commit message improvements.

Junio and Stolee discussed how the mark_trees_uninteresting_sparse() function is implemented in the first patch, and how a variable is named in this function.

They also discussed the purpose of patches 2 and 3 and agreed that they should be merged and what the related tests should do.

Additionally, Junio suggested a number of small code improvements in the last patch. Especially he suggested to get rid of a global variable that was unused. Ramsay Jones, who regularly uses the sparse tool and his own static-check.pl script on the Git code base to find errors, had also found this unused variable separately.

Ævar Arnfjörð Bjarmason chimed in to ask for a clarification about which step the patch speeds up, and if a progress bar should be added while the user is waiting during this step, and how this step should be named on the command line interface. It seems though that some preliminary work would be needed to untangle the steps during which a progress bar is already displayed.

Stolee eventually sent a version 5 of the patch series on January 16th which has since been merged and is in the recently released Git v2.21.0.

Releases

Git 2.21.0, 2.21.0-rc2, 2.21.0-rc1, 2.21.0-rc0
Git for Windows 2.21.0(1)
libgit2 0.28.1, 0.28.0, 0.27.8
libgit2sharp 0.26
GitHub Enterprise 2.16.3, 2.15.8, 2.14.15, 2.13.21, 2.16.2, 2.15.7, 2.14.14, 2.13.20, 2.16.1, 2.15.6, 2.14.13, 2.13.19
GitLab 11.8, 11.7.5, 11.7.4, 11.7.3
Bitbucket Server 6.0
Gerrit Code Review 2.15.11, 2.16.5, 2.15.10, 2.16.4, 2.15.9
GitKraken 4.2.2, 4.2.1, 4.2.0, 4.1.1, 4.1.0, 4.0.6, 4.0.5, 4.0.4, 4.0.3, 4.0.2, 4.0.1, 4.0.0
GitHub Desktop 1.6.2, 1.6.1
Sourcetree 3.1

Other News

Various

GSoC 2019: Git’s application submitted and got accepted as one of 207 open source projects; ideas for project proposals published
The Git Contributor Summit 2019 happened on January 30th in Brussels. Elijah Newren took some notes. A video stream of the event was broadcast and recorded, but is not yet available for download.
The Git Merge Conference 2019 happened on February 1st in Brussels. Videos of the presentations are not yet available. The GitHub team expects them to be available before the end of this month.
- A short Mission report: Git Merge 2019 was posted on GitHub blog.
GerritHub.io multi-site plugin is going public and has been proposed to be hosted on gerrit-review.googlesource.com. That is going to be the first globally available Open Source implementation for having Gerrit Code Review masters replicated and synchronized over multiple sites.

Light reading

gitgeist: a git-based social network proof of concept by Karim Yaghmour (mentioned on LWN.net).
France enters the Matrix [LWN.net] by Tom Yates covers Matthew Hodgson talk about Matrix at FOSDEM 2019; Matrix is an open standard and lightweight protocol for real-time communication, allowing to create decentralized federated instant messaging system with end-to-end encryption; the video of the whole talk is available.

[…] the “first-class citizen” in Matrix is not the message, but the conversation history of the room. That history is stored in a big data structure that is replicated across a number of participants; in that respect, said Hodgson, Matrix is more like Git than XMPP, SIP, IRC, or many other traditional communication protocols.
./hacker-tools lectures: Version Control, focusing on Git (article with an embedded 53 minutes video).
An open source parser for GitHub Actions on GitHub Engineering Blog (GitHub Actions were covered in Git Rev News #44).
Snowpatch: continuous-integration testing for the kernel [LWN.net] by Jonathan Corbet. Snowpatch (mentioned in Git Rev News Edition #40) is built on top of patchwork (mentioned in Git Rev News Edition #20).

Git tools and sites

git-history is a web-based tool (for Node.js) to quickly browse the history of a file from any GitHub repository. (GitLab and Bitbucket support is also planned); unfortunately the demo service at https://githistory.xyz/ was down at the time of publishing this edition, causing Chrome and Firefox extensions, which add an Open in Git History button to GitHub, not to work either.
gitgeist-poc by Francois-Denis Gonthier is a Proof-of-Concept implementation of gitgeist: a git-based social network proof of concept idea.
Git Gud is a pretty barebone visual web-based Git simulator, meant to help understand Git better, which got announced by its author Nic Hartley in Git Gud at git; quite similar to Learn Git Branching service (covered in Git Rev News Edition #30).
GitLens — Git supercharged extension supercharges the Git capabilities built into Visual Studio Code.
github-spray is yet another tool to draw on your GitHub contribution graph; there is also GitHub Spray Generator service.
gitbase from source{d} is a tool (in alpha) providing SQL interface to Git repositories, written in Go. It is part of source{d} Engine, and implements MySQL wire protocol. Uses go-git for accessing Git repositories, go-mysql-server for the SQL engine implementation, enry for programming language detection for files, and bblfhs for source code parsing into AST (Abstract Syntax Tree). There is also a web client for it.
Gitana: a SQL-based Project Activity Inspector, written in Python (GitHub repository), was mentioned in Git Rev News Edition #7. Nowadays it imports and digests the data of Git repositories, issue trackers (including Bugzilla and GitHub), Q&A web-sites (including forums and StackOverflow) and instant messaging services to a relational database in order to ease browsing and querying activities with standard SQL syntax and tools.

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Gabriel Alcaras <gabriel.alcaras@telecom-paristech.fr> with help from David Pursehouse and Luca Milanesio.