Git Rev News Edition 70 (December 26th, 2020)

Git Rev News: Edition 70 (December 26th, 2020)

Welcome to the 70th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of November 2020.

Discussions

General

Three Outreachy interns have been accepted to work on Git this winter:
- Sangeeta from India will be mentored by Kaartic Sivaraam and Christian Couder to work on the ‘Accelerate rename detection and the “range-diff” command in Git’ project. Sangeeta started blogging about her internship.
- Joey Salazar from Costa Rica will be mentored by Emily Shaffer and Jonathan Nieder to work on the ‘Add Git protocol support to Wireshark’ project. Joey also started blogging.
- Charvi Mendiratta from Faridabad, Haryana, India will be mentored by Phillip Wood and Christian Couder to work on on the ‘Improve dropping and rewording commits in Git interactive rebase’ project. Charvi also started blogging.

Reviews

[PATCH 00/10] Advertise trace2 SID in protocol capabilities

Last October Josh Steadmon sent a patch series to the mailing list. The goal of the patch series was to allow a Git client and a Git server, that are communicating to perform a Git operation like a push, a fetch or a clone, to share each other’s trace2 session ID (SID).

Trace2 is a relatively new tracing mechanism that was developed primarily by Jeff Hostetler to improve on the previous trace mechanism.

These tracing mechanisms let users print debug, performance, and telemetry logs to a file or a file descriptor, so that they can better understand what’s going on inside Git processes.

Josh’s patch series allows a Git client to record the server’s trace2 session ID in its logs, and vice versa, by advertising the session ID in a new “trace2-sid” protocol capability.

Josh asked 2 questions in the cover letter of his patch series though. He first asked if the code in the trace2/ directory was supposed to contain only implementation details hidden from the rest of Git and accessible only through the trace2.h and trace2.c files. The reason for this question is that Josh’s code needed to access the trace2 session ID which was previously managed only in trace2/tr2_sid.h and trace2/tr2_sid.c.

Josh’s second question was if it was OK to add a trace2.announceSID configuration option for the feature his patch series implemented. The reason is that some Git processes on servers, like git upload-pack, have previously been prevented to read some potentially malicious config options from local repositories for security reasons.

Jeff Hostetler was very happy with Josh’s patch series saying “Very nice! This should be very helpful when matching up client and server commands.”

He also replied that he indeed intended the trace2/ directory to be opaque, so “that just trace2.h contains the official API”. And he suggested adding a new trace2_session_id() function to trace2.h and trace2.c, that would just call the existing tr2_sid_get() function from trace2/tr2_sid.h and trace2/tr2_sid.c.

Jeff also pointed to the fact that the session ID of a process was built up based on the session IDs of its parent processes. For example a Git process spawned from another Git process will have a session ID of the form <sid1>/<sid2>, where <sid1> is the session ID of its parent. And if it spawns another Git process, the session ID of this new process will be of the form <sid1>/<sid2>/<sid3>.

Jeff also mentioned that if the GIT_TRACE2_PARENT_SID environment variable, which is used to communicate the session ID of the parent process, already contains something, for example ‘hello’, when the initial Git process is launched, then the session IDs will accumulate after this existing content, like hello/<sid1>/<sid2>/<sid3>.

Jeff wondered if clients and servers should share only the last session ID component, for example <sid3>, instead of the full session ID.

While Jeff couldn’t answer Josh’s second question about possible security issues with using a new trace2.announceSID configuration option, Junio Hamano, the Git maintainer, replied that it was probably OK, given the fact that Git processes, like git upload-pack, already take into account at least some boolean config options, like uploadpack.allowrefinwant.

Josh thanked Jeff for his review, and said that in the version 2 of his patch series he had implemented the new trace2_session_id() function in trace2.h and trace2.c Jeff had suggested, and that it was probably OK for clients and servers to share their full session ID rather than the last component.

Junio asked Josh to document this design decision to share the full session ID. Josh replied to Junio that he did that in the version 3 of his patch series.

In reply to version 2 of the patch series though, Junio had also requested that session IDs, how they look like, and what special characters they can contain, should be better documented to help third parties writing their own implementation of the protocol.

This spawned a small discussion thread where Jeff, Ævar Arnfjörð Bjarmason, Junio and Josh eventually agreed on limiting the content of the GIT_TRACE2_PARENT_SID environment variable and session ID to printable, non-whitespace characters that fit into a Git protocol line.

Another discussion following version 2 between Josh, Junio and Johannes Schindelin, alias Dscho, was about Junio’s suggestion to separate the concept of “session” from the trace2 mechanism. This led to the decision to use just “session ID”, instead of “trace2 session ID”, in the documentation, and to call the new configuration option transport.advertiseSID instead of trace2.announceSID.

Other smaller discussions over details of the implementation and the documentation followed version 2, but version 3 got merged into the next and then the master branch. So this new feature will be released in soon upcoming Git v2.30.

Developer Spotlight: Felipe Contreras

Who are you and what do you do?

I’m a software engineer who has worked in all areas of software development. These days they use the term “full stack developer” for a developer that works on all areas of web development, which I have done, but I’ve also worked on Linux kernel development, middleware, UI; you name it.

Right now I’m a freelancer offering services to local companies doing mostly web development, but not quite.

It’s hard to pigeonhole me because I not only do software development, I also write a blog (which is not only about software), I’ve moderated online communities, and I read a lot from intellectuals regarding the ongoing culture war, and so on.

I guess you could say I’m a jack of all trades (but I’ve actually mastered a few).
What would you name your most important contribution to Git?

That’s very hard to tell because I’ve worked all over the place, so most of my changes are minor improvements. But I guess the one with the biggest impact to users would have to be git-remote-hg; a bidirectional bridge between Git and Mercurial.

Ironically I’ve never had to use Mercurial, nor worked on a project that uses Mercurial. However, I did work on projects that used Subversion and CVS back in 2005, when Git started, and I found it useful that there existed tools to use Git while working with other version control systems.

I used to contribute to Pidgin (the MSN protocol parts), which used Monotone, and I refused to work on such a horrendous VCS, so I started to work on scripts to convert Monotone repositories to Git, and I contributed my patches back through Bugzilla (like all other contributors).

This gave me insight into the inner workings of Git, and eventually when the Pidgin project decided to move to Mercurial (an obvious mistake in my opinion), I started surveying the tools to convert from Git to Mercurial, and I found lots of areas of improvement.

Regarding these tools I would be remiss if I didn’t give attribution to Rocco Rutte, who created the first fast-export script, which I used as inspiration for git-remote-hg, but unfortunately died of cancer in 2009. Without his work I might not have started this particular journey.

Using Rocco Rutte’s program, I had the idea to take that approach, but hide it inside Git’s remote-helper infrastructure, which was surprisingly easy. Everything else that came after was fine-tuning, adding features, and improving Git’s infrastructure to make such features possible.

In response to some pushback that I received from more established Git developers – who claimed that some of these changes were specific to git-remote-hg – I decided to create git-remote-bzr as a proof of concept to interact with Bazaar (again, I never personally had to use Bazaar), but it turned out there was a huge demand for such a tool, so I kept working on it.

Some GNU Emacs developers loved git-remote-bzr, and it probably helped in the eventual move to Git, even though Richard Stallman initially pushed back hard against it.

So, even though I never really used git-remote-hg or git-remote-bzr, I kept working at them because clearly other people did. I understand all too well the frustration of working on a VCS that is foreign and suboptimal, especially when you know Git has everything you need; it’s like being stranded.
What are you doing on the Git project these days, and why?

A lot of things. I stopped working on the Git project for many years, but right now I have motivation to work on it again, and there’s literally dozens of features I’m working on. Unfortunately my patches have a tendency to not be accepted, so many of these will not end up helping the end users, but I’m thinking of ways to make these available outside of Git.

The main one is improvements to git pull. Initially there was a complaint from a Red Hat employee about an annoying warning added recently, which prompted me to look back at work I did in 2013 which solves all this, but was never merged. Back then the git pull code was written in shell script, now it’s in C, so I had to rewrite all this functionality.

It’s a lot of work because there are many different workflows, configurations, and options that affect the way git pull works. I think the bulk of the code is mostly done, but there’s a few options I would like to explore that I haven’t mentioned yet in the mailing list, since the current patch series is controversial as it is.

Part of the work is reading back old mail threads which go back to 2008. A lot of problems and suggestions have been mentioned throughout the years, and my patch series tries to compile all of those, in addition to the comments from 2020.

The story of these changes is interesting enough that I have been writing a blog post about it, which is going to be enormous, but a lot more work is needed to finish it properly. Hopefully it will be ready for the next edition of Git Rev News.

In another irony; I don’t even use git pull (I use git fetch + git merge / git rebase).
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?

I would split Git into a library and a command line interface.

Clearly there’s a need for a stand-alone library, since there is libgit2, but Git doesn’t use libgit2, so one has to always catch up with the other.

Recently Ævar Arnfjörð Bjarmason mentioned a thought about somehow splitting the porcelain and plumbing of Git; the part that is for typical end users, and the part that is for advanced users or scripts. Splitting the command line from the library would allow us to more easily see what part belongs where. Plus, I would split the command line into two; git and git-tool. The git command should be for typical end users, and the documentation about those commands should not include any implementation details, or plumbing.

I believe having a clean command line, which hides implementation details and plumbing, would be of great benefit to the average Git user, and in addition would help developers visualize what changes are more likely to affect the end user, plus where the focus of improving documentation should be.

Oh, and one person on the team should be not a developer, but a copy editor, and his job would be to rewrite all the documentation. It’s probably incontrovertible that Git’s documentation can be improved a lot.
If you could remove something from Git without worrying about backwards compatibility, what would it be?

I don’t think I would remove anything from Git.

Plus, I don’t believe any good developer should stop worrying about backwards compatibility, ever. I’m of the opinion that there’s always a way to implement changes that are incompatible with previous versions, but there’s a series of steps. First you add the new functionality, then you add a deprecation period, then you make the new functionality the default, but always allowing the user to access the old functionality.

It’s a lot more work, and takes a lot more time, which is why many bad projects don’t do it, but I think you always need to worry about backwards compatibility, and it’s a good thing Git developers do worry about that.

OK, maybe merge.defaultToUpstream, nobody uses that.
What is your favorite Git-related tool/library, outside of Git itself?

I don’t really use anything outside of Git.

I find Git vanilla to be mostly good enough to do everything I need; and when I don’t, I try to introduce that directly into Git itself.

I created a fork of Git called git-fc with all the features I didn’t manage to land into Git upstream, but I have not updated it in some years (it’s in my endless to-do list), and even though I miss those features a lot, I can manage.

I also have a bunch of projects that add other functionality, like git-related, git-reintegrate, and git-send-series, but I could live fine without those.

I think the only tool I would find very hard to live without is git-smartlist. Since I use gitk a lot to visualize commit history, very often I just want to see the history from the master branch to the current branch I’m at, and while usually you can do that with master..@, that’s not always the case and git smartlist helps a lot in telling gitk exactly what I want to see.
Outside of code, is there anything you would like to change about the Git project?

I think there’s a disconnection between users and developers. Recently I’ve been talking about the curse of knowledge; the better you know something, the less you remember about how hard it was to learn. It’s very typical for experts to underestimate how hard it is to understand something, because they’ve had many years of experience with it.

One example is rebasing. Basically all Git developers are very familiar with rebasing, so they can’t imagine what it must be like for a user to not know how to rebase, or worse; to not know what a rebase is. But feedback from people whose job is to train users tells us the vast majority of new users have no idea what a rebase is.

Of course the Git developers care about the users, but many times we have to imagine hypothetical users and their needs, and it’s not rare that these don’t match the needs of real users.

That’s why Git Users’ Surveys are so important. Unfortunately they haven’t been made in many years, and to be honest I don’t see much point in them if the developers are not going to trust the results and use them to guide the project.

In all the users’ surveys the number one and two areas of improvement without fail are: user-interface, and documentation, and I believe those are the two areas that are neglected the most.

I would take this feedback seriously, and as a project make a real effort to try to improve in these areas.

Releases

Git 2.30.0-rc2, 2.30.0-rc1, 2.30.0-rc0
Git for Windows 2.30.0-rc2(1), 2.30.0-rc1(1), 2.30.0-rc0(1), 2.29.2(3)
Gerrit Code Review 3.3.0
GitHub Enterprise 2.22.6, 2.21.14, 2.20.23, 2.22.5, 2.21.13, 2.20.22
GitLab 13.7.1, 13.7, 13.6.3 13.6.2
GitKraken 7.4.1
GitHub Desktop 2.6.1

Other News

Various

Felipe Contreras shared his sharness/test Vim syntax file to the list. It enables syntax highlighting for the body of test_success, test_failure etc.

Light reading

Use the Git History to Identify Pain Points in Any Project “basic idea - files that change often (with some exceptions) tend to be the ones where most issues occur” - let’s go find them with this useful command. Includes a useful follow on reading about your code crimes.
Commits are snapshots, not diffs by Derrick Stolee on GitHub Blog.
Get up to speed with partial clone and shallow clone by Derrick Stolee on GitHub Blog.
Optimize your monorepo experience - GitHub Universe 2020: video of presentation by Derrick Stolee, Staff Software Engineer, GitHub.
The Philosophy of Scalar, a part of Scalar docs; the tool itself, intended to provide settings and extensions for Git to help manage large Git repositories, was introduced in Git Rev News #60 and further mentioned in Git Rev News #61.
How to Make Your Code Reviewer Fall in Love with You by Michael Lynch; a good counterpart to The Gentle Art Of Patch Review by Sage Sharp from 2014.
8 Git aliases that make me more efficient by Ricardo Gerardi on Opensource.com. Use aliases to create shortcuts for your most-used or complex Git commands.
10 Git Anti Patterns You Should be Aware of, slide deck presented by Lemi Orhan Ergin at ITAKE UnConf 2018; though one should take it and especially the proposed solutions with a critical eye.
Regular Expression Matching with a Trigram Index, or How Google Code Search Worked by Russ Cox (2012).

Git tools and sites

Radicle intends to be a peer-to-peer stack for building software together, based on Git and Radicle Link peer-to-peer protocol.
Compare with ForgeFed (formerly GitPub), a federation protocol for software forges, mentioned in previous Git Rev News.
git-smartlist by Felipe Contreras; a tool to help create typical revisions (e.g. master..@) by generic name, so that you don’t have to.

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Felipe Contreras and Philip Oakley.