Git Rev News: Edition 70 (December 26th, 2020)
Welcome to the 70th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the month of November 2020.
Three Outreachy interns have been accepted to work on Git this winter:
Sangeeta from India will be mentored by Kaartic Sivaraam and Christian Couder to work on the ‘Accelerate rename detection and the “range-diff” command in Git’ project. Sangeeta started blogging about her internship.
Joey Salazar from Costa Rica will be mentored by Emily Shaffer and Jonathan Nieder to work on the ‘Add Git protocol support to Wireshark’ project. Joey also started blogging.
Charvi Mendiratta from Faridabad, Haryana, India will be mentored by Phillip Wood and Christian Couder to work on on the ‘Improve dropping and rewording commits in Git interactive rebase’ project. Charvi also started blogging.
Last October Josh Steadmon sent a patch series to the mailing list. The goal of the patch series was to allow a Git client and a Git server, that are communicating to perform a Git operation like a push, a fetch or a clone, to share each other’s trace2 session ID (SID).
These tracing mechanisms let users print debug, performance, and telemetry logs to a file or a file descriptor, so that they can better understand what’s going on inside Git processes.
Josh’s patch series allows a Git client to record the server’s trace2 session ID in its logs, and vice versa, by advertising the session ID in a new “trace2-sid” protocol capability.
Josh asked 2 questions in the cover letter of his patch series though. He first asked if the code in the
trace2/directory was supposed to contain only implementation details hidden from the rest of Git and accessible only through the
trace2.cfiles. The reason for this question is that Josh’s code needed to access the trace2 session ID which was previously managed only in
Josh’s second question was if it was OK to add a
trace2.announceSIDconfiguration option for the feature his patch series implemented. The reason is that some Git processes on servers, like
git upload-pack, have previously been prevented to read some potentially malicious config options from local repositories for security reasons.
Jeff Hostetler was very happy with Josh’s patch series saying “Very nice! This should be very helpful when matching up client and server commands.”
He also replied that he indeed intended the
trace2/directory to be opaque, so “that just
trace2.hcontains the official API”. And he suggested adding a new
trace2.c, that would just call the existing
Jeff also pointed to the fact that the session ID of a process was built up based on the session IDs of its parent processes. For example a Git process spawned from another Git process will have a session ID of the form
<sid1>is the session ID of its parent. And if it spawns another Git process, the session ID of this new process will be of the form
Jeff also mentioned that if the
GIT_TRACE2_PARENT_SIDenvironment variable, which is used to communicate the session ID of the parent process, already contains something, for example ‘hello’, when the initial Git process is launched, then the session IDs will accumulate after this existing content, like
Jeff wondered if clients and servers should share only the last session ID component, for example
<sid3>, instead of the full session ID.
While Jeff couldn’t answer Josh’s second question about possible security issues with using a new
trace2.announceSIDconfiguration option, Junio Hamano, the Git maintainer, replied that it was probably OK, given the fact that Git processes, like
git upload-pack, already take into account at least some boolean config options, like
Josh thanked Jeff for his review, and said that in the version 2 of his patch series he had implemented the new
trace2.cJeff had suggested, and that it was probably OK for clients and servers to share their full session ID rather than the last component.
Junio asked Josh to document this design decision to share the full session ID. Josh replied to Junio that he did that in the version 3 of his patch series.
In reply to version 2 of the patch series though, Junio had also requested that session IDs, how they look like, and what special characters they can contain, should be better documented to help third parties writing their own implementation of the protocol.
This spawned a small discussion thread where Jeff, Ævar Arnfjörð Bjarmason, Junio and Josh eventually agreed on limiting the content of the
GIT_TRACE2_PARENT_SIDenvironment variable and session ID to printable, non-whitespace characters that fit into a Git protocol line.
Another discussion following version 2 between Josh, Junio and Johannes Schindelin, alias Dscho, was about Junio’s suggestion to separate the concept of “session” from the trace2 mechanism. This led to the decision to use just “session ID”, instead of “trace2 session ID”, in the documentation, and to call the new configuration option
Other smaller discussions over details of the implementation and the documentation followed version 2, but version 3 got merged into the
nextand then the
masterbranch. So this new feature will be released in soon upcoming Git v2.30.
Developer Spotlight: Felipe Contreras
Who are you and what do you do?
I’m a software engineer who has worked in all areas of software development. These days they use the term “full stack developer” for a developer that works on all areas of web development, which I have done, but I’ve also worked on Linux kernel development, middleware, UI; you name it.
Right now I’m a freelancer offering services to local companies doing mostly web development, but not quite.
It’s hard to pigeonhole me because I not only do software development, I also write a blog (which is not only about software), I’ve moderated online communities, and I read a lot from intellectuals regarding the ongoing culture war, and so on.
I guess you could say I’m a jack of all trades (but I’ve actually mastered a few).
What would you name your most important contribution to Git?
That’s very hard to tell because I’ve worked all over the place, so most of my changes are minor improvements. But I guess the one with the biggest impact to users would have to be
git-remote-hg; a bidirectional bridge between Git and Mercurial.
Ironically I’ve never had to use Mercurial, nor worked on a project that uses Mercurial. However, I did work on projects that used Subversion and CVS back in 2005, when Git started, and I found it useful that there existed tools to use Git while working with other version control systems.
I used to contribute to Pidgin (the MSN protocol parts), which used Monotone, and I refused to work on such a horrendous VCS, so I started to work on scripts to convert Monotone repositories to Git, and I contributed my patches back through Bugzilla (like all other contributors).
This gave me insight into the inner workings of Git, and eventually when the Pidgin project decided to move to Mercurial (an obvious mistake in my opinion), I started surveying the tools to convert from Git to Mercurial, and I found lots of areas of improvement.
Regarding these tools I would be remiss if I didn’t give attribution to Rocco Rutte, who created the first
fast-exportscript, which I used as inspiration for
git-remote-hg, but unfortunately died of cancer in 2009. Without his work I might not have started this particular journey.
Using Rocco Rutte’s program, I had the idea to take that approach, but hide it inside Git’s remote-helper infrastructure, which was surprisingly easy. Everything else that came after was fine-tuning, adding features, and improving Git’s infrastructure to make such features possible.
In response to some pushback that I received from more established Git developers – who claimed that some of these changes were specific to
git-remote-hg– I decided to create
git-remote-bzras a proof of concept to interact with Bazaar (again, I never personally had to use Bazaar), but it turned out there was a huge demand for such a tool, so I kept working on it.
Some GNU Emacs developers loved
git-remote-bzr, and it probably helped in the eventual move to Git, even though Richard Stallman initially pushed back hard against it.
So, even though I never really used
git-remote-bzr, I kept working at them because clearly other people did. I understand all too well the frustration of working on a VCS that is foreign and suboptimal, especially when you know Git has everything you need; it’s like being stranded.
What are you doing on the Git project these days, and why?
A lot of things. I stopped working on the Git project for many years, but right now I have motivation to work on it again, and there’s literally dozens of features I’m working on. Unfortunately my patches have a tendency to not be accepted, so many of these will not end up helping the end users, but I’m thinking of ways to make these available outside of Git.
The main one is improvements to
git pull. Initially there was a complaint from a Red Hat employee about an annoying warning added recently, which prompted me to look back at work I did in 2013 which solves all this, but was never merged. Back then the
git pullcode was written in shell script, now it’s in C, so I had to rewrite all this functionality.
It’s a lot of work because there are many different workflows, configurations, and options that affect the way
git pullworks. I think the bulk of the code is mostly done, but there’s a few options I would like to explore that I haven’t mentioned yet in the mailing list, since the current patch series is controversial as it is.
Part of the work is reading back old mail threads which go back to 2008. A lot of problems and suggestions have been mentioned throughout the years, and my patch series tries to compile all of those, in addition to the comments from 2020.
The story of these changes is interesting enough that I have been writing a blog post about it, which is going to be enormous, but a lot more work is needed to finish it properly. Hopefully it will be ready for the next edition of Git Rev News.
In another irony; I don’t even use
git pull(I use
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?
I would split Git into a library and a command line interface.
Clearly there’s a need for a stand-alone library, since there is libgit2, but Git doesn’t use libgit2, so one has to always catch up with the other.
Recently Ævar Arnfjörð Bjarmason mentioned a thought about somehow splitting the porcelain and plumbing of Git; the part that is for typical end users, and the part that is for advanced users or scripts. Splitting the command line from the library would allow us to more easily see what part belongs where. Plus, I would split the command line into two;
gitcommand should be for typical end users, and the documentation about those commands should not include any implementation details, or plumbing.
I believe having a clean command line, which hides implementation details and plumbing, would be of great benefit to the average Git user, and in addition would help developers visualize what changes are more likely to affect the end user, plus where the focus of improving documentation should be.
Oh, and one person on the team should be not a developer, but a copy editor, and his job would be to rewrite all the documentation. It’s probably incontrovertible that Git’s documentation can be improved a lot.
If you could remove something from Git without worrying about backwards compatibility, what would it be?
I don’t think I would remove anything from Git.
Plus, I don’t believe any good developer should stop worrying about backwards compatibility, ever. I’m of the opinion that there’s always a way to implement changes that are incompatible with previous versions, but there’s a series of steps. First you add the new functionality, then you add a deprecation period, then you make the new functionality the default, but always allowing the user to access the old functionality.
It’s a lot more work, and takes a lot more time, which is why many bad projects don’t do it, but I think you always need to worry about backwards compatibility, and it’s a good thing Git developers do worry about that.
merge.defaultToUpstream, nobody uses that.
What is your favorite Git-related tool/library, outside of Git itself?
I don’t really use anything outside of Git.
I find Git vanilla to be mostly good enough to do everything I need; and when I don’t, I try to introduce that directly into Git itself.
I created a fork of Git called
git-fcwith all the features I didn’t manage to land into Git upstream, but I have not updated it in some years (it’s in my endless to-do list), and even though I miss those features a lot, I can manage.
I think the only tool I would find very hard to live without is
git-smartlist. Since I use
gitka lot to visualize commit history, very often I just want to see the history from the master branch to the current branch I’m at, and while usually you can do that with
master..@, that’s not always the case and
git smartlisthelps a lot in telling
gitkexactly what I want to see.
Outside of code, is there anything you would like to change about the Git project?
I think there’s a disconnection between users and developers. Recently I’ve been talking about the curse of knowledge; the better you know something, the less you remember about how hard it was to learn. It’s very typical for experts to underestimate how hard it is to understand something, because they’ve had many years of experience with it.
One example is rebasing. Basically all Git developers are very familiar with rebasing, so they can’t imagine what it must be like for a user to not know how to rebase, or worse; to not know what a rebase is. But feedback from people whose job is to train users tells us the vast majority of new users have no idea what a rebase is.
Of course the Git developers care about the users, but many times we have to imagine hypothetical users and their needs, and it’s not rare that these don’t match the needs of real users.
That’s why Git Users’ Surveys are so important. Unfortunately they haven’t been made in many years, and to be honest I don’t see much point in them if the developers are not going to trust the results and use them to guide the project.
In all the users’ surveys the number one and two areas of improvement without fail are: user-interface, and documentation, and I believe those are the two areas that are neglected the most.
I would take this feedback seriously, and as a project make a real effort to try to improve in these areas.
- Git 2.30.0-rc2, 2.30.0-rc1, 2.30.0-rc0
- Git for Windows 2.30.0-rc2(1), 2.30.0-rc1(1), 2.30.0-rc0(1), 2.29.2(3)
- Gerrit Code Review 3.3.0
- GitHub Enterprise 2.22.6, 2.21.14, 2.20.23, 2.22.5, 2.21.13, 2.20.22
- GitLab 13.7.1, 13.7, 13.6.3 13.6.2
- GitKraken 7.4.1
- GitHub Desktop 2.6.1
- Felipe Contreras shared his sharness/test Vim syntax file to the list. It enables syntax highlighting for the body of test_success, test_failure etc.
- Use the Git History to Identify Pain Points in Any Project “basic idea - files that change often (with some exceptions) tend to be the ones where most issues occur” - let’s go find them with this useful command. Includes a useful follow on reading about your code crimes.
- Commits are snapshots, not diffs by Derrick Stolee on GitHub Blog.
- Get up to speed with partial clone and shallow clone by Derrick Stolee on GitHub Blog.
- Optimize your monorepo experience - GitHub Universe 2020: video of presentation by Derrick Stolee, Staff Software Engineer, GitHub.
- The Philosophy of Scalar, a part of Scalar docs; the tool itself, intended to provide settings and extensions for Git to help manage large Git repositories, was introduced in Git Rev News #60 and further mentioned in Git Rev News #61.
- How to Make Your Code Reviewer Fall in Love with You by Michael Lynch; a good counterpart to The Gentle Art Of Patch Review by Sage Sharp from 2014.
- 8 Git aliases that make me more efficient by Ricardo Gerardi on Opensource.com. Use aliases to create shortcuts for your most-used or complex Git commands.
- 10 Git Anti Patterns You Should be Aware of, slide deck presented by Lemi Orhan Ergin at ITAKE UnConf 2018; though one should take it and especially the proposed solutions with a critical eye.
- Regular Expression Matching with a Trigram Index, or How Google Code Search Worked by Russ Cox (2012).
Git tools and sites
- Radicle intends to be a peer-to-peer stack for building software together,
based on Git and Radicle Link peer-to-peer protocol.
Compare with ForgeFed (formerly GitPub), a federation protocol for software forges, mentioned in previous Git Rev News.
- git-smartlist by Felipe Contreras;
a tool to help create typical revisions (e.g.
master..@) by generic name, so that you don’t have to.
This edition of Git Rev News was curated by Christian Couder <firstname.lastname@example.org>, Jakub Narębski <email@example.com>, Markus Jansen <firstname.lastname@example.org> and Kaartic Sivaraam <email@example.com> with help from Felipe Contreras and Philip Oakley.