Git Rev News Edition 39 (May 16th, 2018)

Git Rev News: Edition 39 (May 16th, 2018)

Welcome to the 39th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of April 2018.

Discussions

General

GSoC students and mentors in 2018

Stefan Beller announced to the mailing list that once again this year the Git project will participate in Google Summer or Code! “We’ll have 3 students and 3 mentors, which is more than in recent years.”
- Pratik Karki, who will work on converting git rebase to C and will be mentored by Stefan Beller and Christian Couder, started blogging about his project:
  - [GSoC] Info: new blog series of my work on Git GSoC ‘18 - announcement on the mailing list
  - (recur code) - posts with “git” category - the blog in question
- Paul Ungureanu, who will work on converting git stash to C) and will be mentored by Johannes Schindelin, alias Dscho, is also starting a blog about it:
  - [GSoC] A blog about ‘git stash’ project - announcement on the mailing list
  - Paul’s Blog - Google Summer of Code 2018 - the blog in question
- Alban Gruin, who will work on converting interactive rebase to C and will be mentored by Stefan Beller and Christian Couder, is like the other students starting a blog about it:
  - [GSoC] Yet another blog series about the GSoC - announcement on the mailing list
  - Mon coin sur l’Internet - posts with “GSoC 2018” category - the blog in question, starting with GSoC with git, week 1 : community bonding

Support

Optimizing writes to unchanged files during merges?

Linus Torvalds emailed the mailing list that after merging XFS code into the Linux kernel he was surprised “pretty much everything” got recompiled and the test build took 22 minutes instead of about a minute.

When debugging this, Linus found that a core header file had been changed by the patch series he had merged, but the changes were already upstream and got merged away. Nevertheless the core header file got written out with the same contents it had before the merge, and as ‘make’ only looks at modification time, everything got rebuilt.

As he was still busy with the merge window, Linus hoped that someone could look at making the merging logic optimize things and not even write out the end result when a file doesn’t change.

Junio Hamano soon posted a fix for this issue but it appeared to Linus that it was not a complete fix, so Linus started debugging further by himself and updated the mailing list by regularly posting what he had found. He even started one of his emails with “[ Talking to myself ]” and the next one with “[ Still talking to myself. Very soothing. ]”

Elijah Newren eventually replied to Linus that he already had a fix for that in one of his patch series that had been merged into the master branch, but unfortunately the fix caused some “impressively awful fireworks”, so the fix had been reverted from master.

Fortunately Elijah wrote that he had already reworked his fix and added many test cases, so that he would be able to resend his patch series containing a fully working fix in a few days.

Linus replied by sending an alternative patch relying on “stupid brute force” to fix his issue. Stefan Beller reviewed the patch and Linus replied back to Stefan discussing other improvements and different approaches.

Elijah also replied to Linus’ alternative patch by discussing different approaches. And Junio agreed with the direction Elijah was taking, though he had not as much time as he would have liked to think this through at that time. Junio discussed Linus’ alternative patch anyway with Linus, and noted that it could cause problems in the case of local dirty changes.

Then Lars Schneider chimed in by suggesting to add a cache to speed up builds. Ævar Arnfjörð Bjarmason then replied to Lars and they discussed this idea but concluded that it wouldn’t work.

Jacob Keller and Junio also discussed Lars’ idea. They suggested alternative ideas or tools to address the underlying problem. Junio especially mentioned ccache which had been also suggested by Stefan Haller.

Phillip Wood replied to Lars by sending a Perl script he has been using to save and restore mtimes to avoid rebuilds.

Elijah resent his patch series a few days later, and after a few minor fixes, the patch series was merged to “next” on May 8. The commit message of the final patch of the series in particular documents the “long and blemished history” of the can-working-tree-updates-be-skipped check and how it has been fixed.

Developer Spotlight: Johannes Schindelin (alias Dscho)

Who are you and what do you do?

That is a broad question ;-)

Professionally, I got a diploma (not a measly MSc) in mathematics (specialty: number theory), graduated as a geneticist, dabbled with psychology as a post-doc, then got heavily involved in scientific image processing and light sheet microscopy. Nowadays, I work proudly as software developer at Microsoft.

From Git’s perspective, I am the maintainer of Git for Windows, the “friendly fork” of Git whose purpose in life is to bring Git to the platform with the largest market share among professional software developers. As maintainer, my goals are 1) to improve the Git user experience, primarily on Windows, 2) to make the contribution process more inclusive and friendly, and 3) to collaborate as effectively with the Git project as I can muster.
What would you name your most important contribution to Git?

That is really hard to answer, because it depends on your perspective which of my contributions you might consider the most important.

From the Git project’s point of view, it is probably that I started porting Git to Windows, and that I started packaging end-user facing installers after Johannes Sixt finished the initial port. Windows is the OS most professional software developers use, after all, and at the same time it is the OS least well supported by Git.

From the perspective of power users, I guess the interactive rebase is what most would deem my contribution with the highest impact.

Speaking for myself, I would deem my tenacity my most important contribution, i.e. that I keep improving Git (both the software as well as the project) and that I continue to care about the user experience, the project and the code.
What are you doing on the Git project these days, and why?

I am working on teaching the interactive rebase a mode where it recreates branch structure by rebasing merge commits, too, rather than dropping them. Kind of a git rebase -p Done Right.

Why? Because I need it to maintain Git for Windows (and GVFS Git and Git for Windows’ fork of the MSYS2/Cygwin runtime, and of BusyBox-w32). Simply rebasing a linear branch of ~500 patches is simply not good enough for a big project like Git for Windows.
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?

Technical debt. We have several metric tons of that. I get that a mostly volunteer-driven project such as Git has a lot of technical debt: who wants to work on technical debt, really?

One blatant example of our technical debt is the absence of a consistent API. We have something we call libgit.a, but even that is pretty inconsistent, and organically grown, and it is specifically intended only for use by Git’s own commands (which is a shame, because it forces every application using Git through the essentially ASCII-based command-line stdin/stdout/stderr).

Another example is that so many central operations are still implemented as Unix shell scripts (with the corresponding lack of performance, safe error handling and portability). Some scripts are Perl scripts, which is better from the performance and safe error handling perspective, but it increases the number of languages you have to speak to contribute to Git, and it still is not particularly portable.

We have a test suite where debugging a regression may mean that you have to run 98 test cases before the failing one every single time in the edit/compile/debug cycle, because the 99th test case may depend on a side effect of at least one of the preceding test cases. Git’s test suite is so not 21st century best practices.

We spawn many, many processes (e.g. pack-refs, reflog, repack, pack-objects, prune, worktree and rerere in git gc, or remote-https, receive-pack, index-pack, unpack-objects in git fetch); it is sometimes really challenging to identify which process is the culprit of segmentation faults, file locking issues, or even “BUG:” messages. Sometimes even Unix shell scripts are involved, so you may very well end up having to go old-school by adding debug statements (because modern techniques such as single-stepping are not an option).

A lot of error code paths end in calls to die(). That might have seemed convenient to the developer who used that, but every piece of useful code will sooner or later be reused, and then such a sloppy “let’s just exit() and not bother with releasing memory or closing file handles” mentality really hurts. Of course, C’s lack of a finally construct makes proper error handling quite a bit bothersome.

We use Asciidoc for our documentation. Worse: even after the rest of the world settled safely on Markdown for pretty much everything new, we decided that it would be a splendid idea to convert some ASCII document to Asciidoc. This hinders fruitful exchanges with all kinds of user documentation, say, in GitHub wikis.

Git assumes that filesystems are case-sensitive. That is true for Linux. It is incorrect for Windows and macOS. And then we use the filesystem as a backend e.g. for loose refs.

The Git index file was designed as a flat file with variable-size items, intended to be read sequentially. The index’ purpose, however, is more like a filesystem, where ideally random-access, concurrent reads and writes should be possible, but the flat file nature prevents that. When your idea of a large project looks like linux.git, that may seem a reasonable design. Even going to the size of gcc.git puts a dent into that impression, though. Most commercial software projects have larger repositories. Sometimes by a large margin.

There is a lot of technical debt in Git.
If you could remove something from Git without worrying about backwards compatibility, what would it be?

Submodules.

Their premise is that they can be treated essentially as if they were files, which is a laughable notion after even a cursory consideration. Files can be untracked, ignored, tracked (clean, modified, deleted). Submodules can have combinations of those. Like, a submodule can be up to date and have untracked files. Oh, and try to detect renames on submodules (including the case where a submodule was modified). I’ll be waiting.
What is your favorite Git-related tool/library, outside of Git itself?

Git garden shears (i.e. the Unix shell script that will hopefully be mostly replaced by git rebase --interactive --rebase-merges before long). I am biased, of course, as I wrote it myself. It is a major time saver for me, though.

I dibble-dabbled with many a Git-related tool from time to time, but at the end of the day I often end up enhancing Git proper to my needs, or use Git aliases or shell scripts (yes, I use shell scripts myself… Unix shell scripting has its uses… although I find myself writing and using node.js more and more, as it makes it a lot easier to use object-oriented abstraction and exception-based error handling, not to mention that it is waaaaaaaaay faster than shell script interpreters). I do try to automate as much of my daily work as possible, and many Git-related tools or libraries simply are not all that automatable.

Releases

Bitbucket Server 5.10
GitLab 10.7.3, Security Release 10.7.2, 10.6.5, and 10.5.8, 10.7.1, and 10.7.0
GitHub Enterprise 2.13.2, 2.12.10, 2.11.16, and 2.10.22
GitKraken 3.6.0

Other News

Various

GitHub is hiring a Linux Kernel Engineer to work on supporting GVFS (an extension of Git to support Git at Enterprise Scale).
The Next Git meetup in London is the 17th of May at 7:00 PM and will be hosted by Skills Matter at CodeNode. There will be talks about Git auto-magic and Gerrit DevOps Analytics.
The Gerrit Code Review Hackathon 2018 has ended, leaving many innovative ideas to the project:
- Improvements on the Mobile experience and a new dark theme
- Ability to dynamically reload its configuration
- Analytics, how to leverage and chart your Git and Code Review data
- Extensible submit rules without having to be a Prolog expert
- Gitiles command line, to invoke your code browser from your prompt

Light reading

Edward Thomson (co-maintainer of libgit2) talks about the branching model of the team developing Visual Studio Team Services.
Five Steps Toward a Clean Commit History by Michael Osborn.
“Git” it together: Some tips on commit etiquette and best practices for junior developers by Jeremy Gunter.
Telling stories with your Git history by Seb Jacobs.
The Most Common Problem In Software Development And How To Fix It presentation by Tom Stuart at RubyConf 2017, especially the History hygiene. part.
Best practices for securely storing API keys by Bruno Pedro examines different tools and approaches, including:
- git-remote-gcrypt to encrypt whole repository using remote helpers mechanism,
- git-secret, which is a Bash script to encrypt files using public GPG keys of all trusted users; git secret was mentioned in Git Rev News Edition 15, and has since acquired webpage and logo,
- git-crypt, which is a similar tool to git-secret, but it is a binary executable (it is written in C++), and instead of requiring secret files to be ignored and untracked it uses filter and diff gitattributes,
- BlackBox by team behind StackOverflow, which is a tool to safely store secrets in Git/Mercurial/Subversion. It also supports the encryption of small strings and not only entire files when working with Puppet (via Puppet’s Hiera).
5 Git Fundamentals by Jorge Yau (2016)
Version control for fun and profit: the tool you didn’t know you needed. From personal workflows to open collaboration, a Jupyter notebook

Git tools and sites

Mo Repos, Mo Problems? How We Make Changes Across Many Git Repositories explains how Clever uses a CLI tool called microplane they developed to make changes across many repos.
Some mutt(1) patches and scripts by Taylor Blau, posted on Git mailing list.
Gitwin - Git Server for Windows, a packaging of Git, OpenSSH, Nginx and many other related tools to make it a ready-to-use solution as a secure Git repository on Windows.
git-vanity-sha will try to tweak the committer timestamp to produce vanity hex prefix for commit SHA; it is similar in function to git-sham which does it and more by appending different random series of three emojis, and which was covered in Git Rev News Edition 4.
git-shame finds out to blame for stale remote branches.
Tugboat is a service allowing you to generate preview of your working website for every pull request, tag or branch and share it (and see visual regressions). Works with GitHub, Bitbucket, and Gitlab.
git-driven-refactoring – sample code for “Git Driven Refactoring” presentation by Ashley Ellis Pierce at RubyConf 2017, GitHub Universe 2017 and Git Merge 2018.
Tools to prevent from accidentally storing secrets in repositories:
- git-secrets is a tool to prevent from committing secrets and credentials into Git repositories, was mentioned in Git Rev News Edition 25.
- git-all-secrets is a tool to capture all the Git secrets in multiple repositories by leveraging multiple open source Git searching tools (truffleHog, repo-supervisor), was mentioned in Git Rev News Edition 28.
- GitLeaks is a tool to “check Git repos for secrets and keys”. was mentioned in Git Rev News Edition 36.
- repo-security-scanner by UKHomeOffice is a command-line tool that finds secrets accidentally committed to a Git repo, e.g. passwords, private keys.
Tools to encrypt repositories, files in epositories, or fragments of files (for example to safely store secrets):
- git-remote-gcrypt is a Git remote helper to push and pull from repositories encrypted with GnuPG (a shell script), described in Best practices for securely storing API keys.
- git-secret is a bash tool to store your private data inside a Git repository (encrypting files), described in Best practices for securely storing API keys; was mentioned in Git Rev News Edition 15.
- git-crypt enables transparent encryption and decryption of files in a Git repository (written in C++), described in Best practices for securely storing API keys.
- BlackBox is a tool to safely store secrets in Git/Mercurial/Subversion, encrypting files (and small strings when using Puppet), described in Best practices for securely storing API keys.
- transcrypt is a tool to transparently encrypt files within a Git repository (with OpenSSL’s symmetric cipher routines).

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Gabriel Alcaras <gabriel.alcaras@telecom-paristech.fr> with help from Johannes Schindelin, Elijah Newren and Luca Milanesio.