Git Rev News Edition 85 (March 31st, 2022)

Git Rev News: Edition 85 (March 31st, 2022)

Welcome to the 85th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the month of February 2022.

Discussions

Reviews

[PATCH] http API: fix dangling pointer issue noted by GCC 12.0

Ævar Arnfjörð Bjarmason sent a patch to the mailing list to fix an issue that appears when compiling Git with the gcc 12.0 development branch.

The issue is that this gcc version emits a warning on the last line of the following code:
```
      int finished = 0;

      slot->finished = &finished;
```
where slot is an instance of struct active_request_slot which has an int *finished member.

The warning is “storing the address of local variable ‘finished’ in ‘*slot.finished’ [-Werror=dangling-pointer=]”.

Ævar first thought about making the struct active_request_slot member an int finished member, so it wouldn’t be a pointer anymore, and then about removing the int *finished member entirely and using the int in_use member instead. His patch implemented the latter.

Taylor Blau reviewed Ævar’s patch but wasn’t sure about one of the places where the int *finished member was removed.

Junio Hamano also reviewed the patch and wondered if the new gcc version was correct in flagging the code in the first place, as it appeared that slot->finished wasn’t used outside the function where it’s assigned the address of a local variable.

Junio also wondered if it was correct to remove the int *finished member entirely and use the int in_use member instead, as that looked like reversing the patch that introduced int *finished without fully explaining why it was correct to do so.

To address the issue Junio then proposed implementing the following:
```
      if (slot->finished == &finished)
            slot->finished = NULL;
```
so that hopefully gcc would understand that the slot->finished field isn’t using a local address outside the current function.

Ævar replied to Junio, that assigning slot->finished to NULL at the end of the function would indeed quiet gcc. So he sent a version 2 of his patch that only added slot->finished = NULL; at the end of the function.

Junio wasn’t sure if the change was completely correct as he said he no longer trusted his own reading of the http code, but he agreed that it should be merged if gcc wasn’t fixed soon.

Taylor took another look at the patch and found other functions where slot->finished is assigned, so he wondered if a better fix would be to change the int *finished member into a tri-state “UNKNOWN/TRUE/FALSE enum”.

Ævar then sent a version 3 of his patch where he only improved the commit message compared to the previous version. In the new commit message he explains why the other places where slot->finished is assigned are already correct and don’t need any change.

Taylor then agreed with Ævar’s explanations, but Junio wasn’t sure about them, so he changed the commit message a bit, explaining how we could make sure that the change to clear slot->finished is correct by adding the following code before it:
```
        if (slot->finished)
               BUG("the unconditional clearing at the end is wrong");
```
For now the patch has still not been merged into the next branch as Junio hopes that gcc 12.0 will be fixed before it’s actually released. But in case it isn’t, at least we are ready for it.

Developer Spotlight: Victoria Dye

Who are you and what do you do?

Hi! My name is Victoria Dye, and I work as a software developer at GitHub. I spend a lot of my time contributing to Git & some of its downstream forks (Git for Windows and microsoft/git), and occasionally Git Credential Manager (although it’s been a while). I really enjoy solving tough performance problems in software, so I’m hoping to do more of that in Git in the not-so-distant future!
What would you name your most important contribution to Git?

I haven’t been contributing to Git for all that long, so it’s hard to judge the importance of anything I’ve done. That said, I’m pretty proud of the work I’ve done around sparse index and sparse-checkout. It’s been exciting to contribute and see those features mature so rapidly over the past couple months, and I hope it continues to gain traction with users looking for better performance in Git!
What are you doing on the Git project these days, and why?

My focus pretty much since I started working on Git has been on integrating commands with sparse index. Most recently that’s been git stash, which not only requires its own integration, but also the integration of all the commands it shells out to (update-index, read-tree, etc.). The goal of these initial sparse index integrations (including those by Derrick Stolee & Lessley Dennington) is to cover the most popular commands as reported by users at $DAYJOB (git add, git reset, etc.), so the theoretical performance gains of the sparse index can turn into a real improvement to user experience.

Beyond my own work, I’m looking forward to mentoring a GSoC contributor in more sparse index integration! It’s an admittedly complicated topic, but — based on my own experience – working on it can give new contributors an incredible depth of knowledge in Git that’s hard to get elsewhere. Plus, I love mentoring developers and creating a positive environment for them, so I’m excited to have the chance to do that here!
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?

It would be great to see the shell script-based test suite rewritten into a unit/integration/end-to-end testing stack. The tests we have are almost exclusively end-to-end (excepting hackery with test-tool or trace2 logs exposing internal functions to test them), so we often have incomplete coverage of some of the trickier code paths (e.g., unpack_trees). And, especially for developers that like test-driven development, it can be a lot easier working with code that has a robust unit testing suite.
If you could remove something from Git without worrying about backwards compatibility, what would it be?

In general, I don’t think there’s a feature that I’d want to completely remove. Git is a fantastic tool in that it lets developers structure their work in whatever way works best for them; something I personally don’t have a great experience with (looking at you, submodules) might just happen to be someone else’s favorite tool.

However, I would absolutely remove an implementation pattern in Git itself: shell script builtins and Git commands shelling out to other Git commands. They’re prone to bugs, difficult to troubleshoot, and can almost always be implemented more efficiently by commonizing the code they’re trying to get out of another command.
What is your favorite Git-related tool/library, outside of Git itself?

I really, really like my current setup of VSCode + GitLens. I used to have a bit of a “text editor is for editing text, Git command line is for source control” attitude and avoided using any integrations between the two, but recently I gave in and installed GitLens. I tend to heavily use the commit history when investigating bugs & learning a new area of the code, so the inline ‘blame’ and commit-message-on-hover in particular make that process about as fast it could be.
What does your mailing list workflow look like?

For submissions, I use GitGitGadget. It really simplifies the process of submitting to Git (especially re-rolls) and, if anything, it prevents me from making silly mistakes with git send-email.

For everything else, I have two email inbox folders: things addressed to me directly (submission feedback, patches I’ve been CC’d on) and “everything else”. For direct feedback/CCs, I try to respond within a couple days if I think I can contribute something valuable to the discussion. The “everything else” folder is set to “automatically mark as read” to avoid a distracting barrage of “unread”s in my inbox, so I typically scan it a couple times a day to stay up-to-date and check whether there’s anything I can help with or review.
Do you think contributing to Git has a high entry barrier? If yes, do you have any suggestions to improve the experience for new contributors?

In my experience, not really (at least, not so high that it excludes would-be contributors). But I think it might seem that way because the mailing list can be super intimidating when you’re looking to get started.

The mailing list is constantly active with complex patches and lots of experienced developers iterating on complicated feedback, so it can be easy to feel as though you can’t contribute much to it or that your code won’t meet the project’s standards. There have been efforts in the past to guide new contributors and help them technically learn the project, but it ultimately comes down to the first interactions someone has with the rest of the mailing that that determines whether or not they stick with it.

In terms of suggestions for improvement, I think it’s important for reviewers to maintain a supportive tone in even highly constructive reviews. A lot can get lost in text communication, so sometimes it’s worth explicitly calling out your appreciation for a contribution. This is especially true for new contributors, but is just as applicable to any reviews.

Another (more concrete) suggestion would be to make it easier for new contributors to find appropriate “starter” features/bugs of different sizes. Other open source projects have issue trackers with a “good first project” tag (and, as a new contributor on other projects, I’ve definitely appreciated them). Although there have been discussions about creating public issue trackers outside the mailing list, to my knowledge nothing like that exists yet for Git, let alone a subset for new contributors.
What is your advice for people who want to start Git development? Where and how should they start?

If you’re not coming in with a clear project or bugfix in mind, it can be really difficult to figure out where to start. If you’re in that position, something that has helped me in the past has been to just start testing things until they break. Read up on a command you find interesting, look for scenarios (e.g. options, sequences) that aren’t well-tested in our existing tests, and start experimenting! Worst case, you can improve test coverage and help us avoid issues in the future; a lot of times, though, you’ll find a bug or an opportunity to add a new option. In any event, the experience will teach you a ton about Git and make you more comfortable with contributing.

And once you have a working change, don’t be afraid to submit it out of fear that it’s not “good enough”. The goal we all share is to make Git as good as it can possibly be, so your contribution – no matter how small or trivial you think it is – is a valuable part of what makes this project succeed.
If there’s one tip you would like to share with other Git developers, what would it be?

Keep learning and trying new things! The Git development community has a vast range of experience across its members – from people who worked on it since the beginning to people making their first open source contribution ever. But it’s also an easy place to get stuck in “your niche” (if you’re lucky, you get stuck with a couple of other people so you can review each others’ work).

Of course, none of us have infinite time to spend on Git. But every once in a while, take a minute to read a contribution from someone you don’t know on a topic you’re not familiar with. Or investigate a bug report on a command you’ve never worked on! Or share your idea for a process we could try (like the Git Review Club)! Not only will you learn something new and improve yourself as a developer, but the collaboration you foster will help all of us improve Git in the long term.

Releases

libgit2 1.4.2
GitLab 14.9.1, 14.9, 14.8.4, 14.8.3, 14.7.6, 14.7.5, 14.6.6
Gerrit Code Review 3.5.1, 3.4.4, 3.3.10,
GitHub Enterprise 3.4.0, 3.3.5, 3.2.10, 3.1.18
GitKraken 8.3.3, 8.3.2, 8.3.1
GitHub Desktop 2.9.12, 2.9.11
Tower for Windows 3.1 (What’s New in Tower 3.1 for Windows blog post)
Tower for Mac 8.1

Other News

Various

Removed unencrypted Git protocol and certain SSH keys in GitHub (removed all support for DSA keys, removed legacy SSH algorithms HMAC-SHA-1 and CBC ciphers, etc.); the motivation behind these changes can be found in the Improving Git protocol security on GitHub article on GitHub Blog.
SSH and Git, meet 1Password: you can now, among other things, enable the SSH agent in 1Password.
Hugging Face uses Gitaly to store Machine Learning related repos.
GitLab is hiring to work on Gitaly.

Light reading

How to Really Use Git: 10 Rules to Make Git More Useful by Bruno Brito on HackerNoon.
Git and GitHub for Marketing Teams by Bruno Brito on Tower’s blog.
The advantages of large, long-running pull requests by David Heinemeier Hanson on hey.com.
How Do I Resolve Merge Conflicts? (without using graphical tools) by Rizèl Scarlett for GitHub, on DEV.to.
How to Rename a Local or Remote Branch in Git by Kolade Chris on freeCodeCamp.
Automatically Format your code on Git Commit using Husky, ESLint, Prettier in 9 minutes by Tapajyoti Bose on DEV.to and Medium.
How Do Git Branches Work? by Anthony Heddings on Cloud Savvy IT.
Teamwork & Git: Dr Max Wilson explains how teams collaborate using Git in a video on the Computerphile channel on YouTube.
- Links to previous videos in this series can be found in Git Rev News Edition #83 and #84.
Speeding Up Commit Graph Walks in Dolt with Materialized Ancestor Closures: Dolt is a version controlled SQL database, first mentioned in Git Rev News #62; you can find more information about versioning and databases in edition #82.

Git tools and sites

Kart stores geospatial and tabular data in Git, providing version control at the row and cell level. It provides repository working copies as GIS databases and files, allowing one to edit them directly in common GIS software without plugins. Support for GeoPackage, PostGIS, Microsoft SQL Server and MySQL formats.
split-patch project consists of the split-patch.py tool, which is git add -p on steroids for patches (*.patch files), allowing to choose in which bucket to put each hunk, and the git-split-commit command (which can be used from interactive rebase) which uses split-patch.py to split the HEAD commit into multiple commits.
DagsHub is a web platform for data version control and collaboration for data scientists and machine learning engineers. The free tier provides 10 GB of DAGsHub Storage, and allows up to 3 collaborators in private projects.
See also the following articles:
- DagsHub: a GitHub Supplement for Data Scientists and ML Engineers by Khuyen Tran in Towards Data Science, a Medium blog.
- DagsHub → Github for Data Science by Shubham Saboo on Towards AI blog.
fds (Fast Data Science), is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping Git and DVC.
- DVC (Data Version Control) was first mentioned in Git Rev News Edition #42.
Jujutsu (jj) is an experimental Git-compatible DVCS. It combines features from Git (data model, speed), Mercurial (anonymous branching, simple CLI free from “the index”, revsets, powerful history-rewriting), and Pijul/Darcs (first-class conflicts), with features not found in either of them.

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Victoria Dye and Bruno Brito.