Git Rev News Edition 105 (November 30th, 2023)

Git Rev News: Edition 105 (November 30th, 2023)

Welcome to the 105th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the months of October 2023 and November 2023.

Discussions

General

Git participates in Outreachy’s December 2023 to March 2024 round

Achu Luma will work on the “Move existing tests to a unit testing framework” project. They will be mentored by Christian Couder.

Congratulations to Luma for being selected!

Thanks to GitLab for sponsoring this Outreachy internship! Thanks also to the other contributors who applied and worked on micro-projects, but couldn’t be selected! We hope to continue to see you in the community!

Reviews

[PATCH v2 0/2] Prevent re-reading 4 GiB files on every status

In May 2022 Jason Hatton sent an email to the mailing list about the fact that any file of a size that is an exact multiple of 8GiB makes Git extremely slow on the repository.

He said that he had already opened an issue about this on the Git for Windows issue tracker where Jason, Philip Oakley, brian m. carlson and Johannes Schindelin, alias Dscho, had already discussed the issue.

Git uses an uint32_t type, a 32 bit long unsigned integer, for storing the file size in the index. This rolls over if the value is greater than 2 to the power 32, so with file sizes over 4GiB. When the size is exactly 4GiB or a multiple of it, like 8GiB, the rollover makes it zero.

A zero file size in the index has a special meaning for Git, though. It tells Git that the file needs to be hashed again. Hashing a file is supposed to reset its file size in the index to a non-zero value, but with a 4GiB file size the rollover happens and the file size is still zero. So the hashing will be performed again and again by many different Git commands, making Git very slow.

Jason proposed, as a solution to this problem, to detect when the rollover would happen, and in that case set the size to 1 instead of zero.

Junio C Hamano, the Git maintainer, replied to Jason confirming the issue and explaining it a bit more in detail. Jason and Junio then discussed the issue a bit more, while Jason tested locally his suggested fix and proposed to send a real patch to fix the issue.

René Scharfe then chimed into the discussion asking if a value other than one would be better and would avoid other possible issues. Philip Oakley replied to René suggesting using 0x80000000 instead of 1 when the rollover is detected. This would make it easier to detect “almost all incremental and decremental changes in file sizes”, as the file size in the index helps detecting file changes.

Jason and Philip discussed the issue a bit more and agreed that using 0x80000000 only for exact multiples of 4GiB would likely be the best solution.

Philip and Carlo Marcelo Arenas Belón also tried to help Jason properly submit a patch to the mailing list.

Jason then sent a patch to the mailing list with the changes and explanation that had been discussed. Torsten Bögershausen, Philip and Junio reviewed it, and suggested some improvements. Junio especially requested some tests to be added.

After some discussions with Jason to clarify what should be improved, Jason sent another version of his patch.

It looked like Jason found an issue with the patch due to using 0x80000000 instead of 1. René and Philip discussed it with Jason, but there was no clear conclusion. It wasn’t even clear if there was an issue at all. But anyway the work on this stopped for more than one year.

Fortunately a few weeks ago, brian m. carlson sent a new version of Jason’s patch along with another patch adding tests.

These patches were reviewed by Eric Sunshine, Jeff King, alias Peff, Junio and Jason. After some discussions it appeared that the patches were good enough for Junio, so he decided to apply a small change and then merge them. This issue is therefore fixed in Git 2.43.0 released on November 20th.

Developer Spotlight: Alexander Shopov

Who are you and what do you do?

I am Alexander Shopov - a backend engineer in the Amsterdam office of Uber working on money related systems. I am a long time translator of FOSS software to Bulgarian - I am coordinating translations of GNOME, Translation Project and many GNU modules. Bulgarian is an Eastern South Slavic language written in the Cyrillic alphabet.
What would you name your most important contribution to Git?

I made and now maintain the Bulgarian translation of the text interface of Git, Gitk, and Git Gui.
What is the typical workflow of a contributor engaged in Git translation?

There are 19 translations of the text interface of Git, and only 13 of them are above 80%, so I am not sure about “typical”. It is a fairly standard workflow for a FOSS project.

Generally one needs to do the following:
1. Read the translator-targeted README.md in the po directory
2. Sync pace with the calendar of Git releases
3. Use the l10n coordinator repository maintained by Jiang Xin who makes sure translations get integrated upstream.
Currently the translation is a bit above 5500 messages, which is about 40k words, 250k of characters, or about 150 pages of text. It can be intimidating for a new translator. But you can definitely make it: be patient and translate some messages every release, merge, publish and repeat. Even better though harder is getting more than one person translating.
Do you contribute to Git in ways other than providing translation? If so, could you elaborate about them?

Sadly not that much. On rare occasions I improve messages and mark strings for translations. Perhaps that will be the way I contribute unless I find a mentor and something that I find particularly interesting and important for me. So if anyone is willing to mentor me, especially in making large repos faster - ping me. I can be a competent tester at least.
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?

Due to its enormous success, Git is being used on humongous code bases with a crazy number of files, directories, commits and branches. Working with repos larger than 10GB can be a bit slow. Improving the experience would be a great thing.
If you could remove something from Git without worrying about backwards compatibility, what would it be?

Backwards compatibility is massively important and I am thankful developers and users are all invested in this.

If we treat this as a hypothetical question, there are 3 things to Git:
- the command-line interface
- the wire protocol
- the storage format
The command-line interface is gradually being improved. The wire protocol is also a place where there are workarounds for versioning. The storage format however is another (quite conservative and public) API. I would remove the old versions and try to design it targeting projects that are 10-100 times larger than the Linux kernel first. In for a penny, in for a pound. If we break things, let us break them so hard that bards will sing songs about us!
What is your favorite Git-related tool/library, outside of Git itself?

I mainly use command line git plus gitk and git-gui. I do like using the meld diff tool when I work on translations.
Do you happen to have any memorable experience w.r.t. contributing to the Git project? If yes, could you share it with us?

The initial getting to 100% translated messages was a challenge. I decided that I should translate Git around December 2013. That was around 2200 messages at that time and it took me about 3 releases of Git to reach 100%. Getting to 100% was immensely hard, rewarding and memorable. Afterwards keeping the translation at 100% was much easier.
Is there something you feel could be done to ease the life of translators?

The terminology glossary of Git is much larger than 7 years ago, and we (the translators) should actually update git://repo.or.cz/git-gui.git::po/glossary and merge it in Git.
What is your advice for people who want to start Git development? Where and how should they start?

I don’t know to be honest. If I knew I may have started already.
If there’s one tip you would like to share with other Git developers, what would it be?

That would be the tip of master two years in the future. On a more serious note - perhaps more tools for migration out of the still existing proprietary version control systems would be helpful.

Other News

Various

Highlights from Git 2.43 by Taylor Blau on GitHub Blog. Those include new git repack tricks (including adjusting sparse clone filters), nicer looking reverts of reverts with git revert, fixed interaction between --subject-prefix and --rfc in git format-patch, custom log format options that simulate the decorations, etc.
Gitea Cloud: A brand new platform for managed Gitea Instances, designed for enterprise organizations to set up and run their own Gitea instances more easily and efficiently.
Announcing DoltgreSQL by Daylon Wilkins on DoltHub Blog.
- Git-for-Data, Version-Controlled Database Dolt Gets PostgreSQL-Flavor by Sergio De Simone on InfoQ.
- Dolt, a version-controlled database, was first mentioned in Git Rev News Edition #62.
An Interesting CMS With Version Control is Now Open-Source! by Sourav Rudra on It’s FOSS News (it’s TinaCMS).
Introducing the Space Git Subtree by Ilia Afanasiev on The Space Blog (where Space is JetBrains’ code collaboration platform).
Developers can’t seem to stop exposing credentials in publicly accessible code by Dan Goodin on Ars Technica, and
Uncovering thousands of unique secrets in PyPI packages by Tom Forbes on GitGuardian Blog.
14 years of JGit/EGit Code Reviews migrated to GerritHub by Luca Milanesio on GerritForge Blog.

Light reading

How I (kind of) killed Mercurial at Mozilla by Mike Hommey (author of git-cinnabar, Git remote helper to interact with Mercurial repositories).
Julia Evans continues the series of articles about Git (started in Git Rev News #103 with In a git repository, where do your files live? and Git Rev News #104 with Some miscellaneous git facts); currently there are the following additional posts: Confusing git terminology, git rebase: what can go wrong?, How git cherry-pick and revert use 3-way merge, and git branches: intuition & reality.
- Julia Evans (@b0rk@jvns.ca) asked about a read-only FUSE filesystem for a Git repository where every commit is a folder and the folder contains all the files in that commit on Mastodon, so this series may continue (so far it led to very experimental git-commit-folders tool from her, and GitMounter from Jordan Rose being made public).
- See also Pain in the dots by Matthew Brett (part of Notes and tutorials on git), about the confusing difference in how two-dot and three-dot notations behave in git log and git diff, as an addition to the Julia Evans’ article about confusing Git terminology, the .. and … section.
How I teach Git by Thomas Broyer on his blog (also on DEV.to). Inspired by Julia Evans’ (renewed) interest in Git and her questions on social networks.
Stacked Diffs (and why you should know about them) by Gergely Orosz in The Pragmatic Engineer blog. Another article about Stacked Diffs can be found in Git Rev News Edition #44.
- Compare and contrast Ship / Show / Ask: A modern branching strategy mentioned in Edition #79.
Why I Prefer Trunk-Based Development by Koen van Gilst on his blog.
- See also Patterns for Managing Source Code Branches in Git Rev News Edition #73.
A bit controversial Dependencies Belong in Version Control (even if it’s not practical today due to Git’s limitations) by Forrest Smith on his blog.
Managing My Resume with Git: A Version Control Approach by Bui Dang Binh (dunkbing) on DEV.to.
See the History of a Method with git log -L by Caleb Hearth on his blog; the post lists also a few his other articles about Git:
- Stash only what git commit wouldn’t commit.
- Ignore refactoring commits in git blame.
- Use your SSH key to sign commits.
  - See also, for example, Signing Git Commits with SSH Keys from Git Rev News Edition #83.
Why Git blame sucks for understanding WTF code (and what to use instead) by Tekin Süleyman (2020); the author recommend “pickaxe” search with git log -S and git log -G, or searching commit messages with git log --grep.
How to Resolve Merge Conflicts Using the Merge Editor Feature on VS Code by Ayu Adiati on Ayu’s Notes On Blog (also on DEV.to, as part of larger Open-Source Series’ Articles).
The Ultimate “git nah” Alias to throw away current changes, untracked files and rebase state, by Paul Redmond on Laravel News.
Understanding Git: The history and internals by Kenneth DuMez on the Graphite Blog (more about history and internals than about understanding Git). See also:
- GitHistory page in the archives of Git SCM Wiki,
- The Git Parable, by Tom Preston-Werner (2009) - the ideas behind the architecture of Git; covered in Git Rev News #30,
- Will Hay Jr.’s The Architecture and History of Git: A Distributed Version Control System, mentioned in Git Rev News #46,
- The History of Git: The Road to Domination in Software Version Control referenced in Git Rev News #60.
Tracking SQLite Database Changes in Git with an appropriate textconv gitattribute, by Garrit Franke on Garrit’s Notes.
GitHub’s all-in bet on AI may overlook Git by Matt Asay on InfoWorld.
🙏 Please Add .gitattributes To Your Git Repository by Carl Saunders on DEV.to (2020).
- A .gitattributes file can be used to improve language detection on GitHub, which is using the Linguist library.

Easy watching and listening

Git Training playlist of 45 short YouTube videos by Joost De Cock provides Git training materials for people who would like to understand how Git works rather than try to memorize all of its commands without knowing what they do.
The Real Python Podcast: Episode 179: Improving Your Git Developer Experience in Python

Git tools and sites

gitattributes.io is a service to generate .gitattributes files, similar to gitignore.io.
githistory.xyz is a service that allows to quickly browse the history of files in any Git repo (from GitHub, GitLab, Bitbucket). Also available as Chrome, Firefox, and Visual Studio extensions, and as git-file-history command line tool (in Node.js). Mentioned in passing in Git Rev News Edition #48.
Josh Branchaud (jbranchaud) collected a list of Today I Learned (TIL) tips about Git.
lei is a command-line tool for importing and searching email, regardless of whether it is from a personal mailbox or a public-inbox instance, like public-inbox.org or lore.kernel.org.
Warning: lei is still in its early stages and may destroy mail.
- See also lore+lei: part 1, getting started article by Konstantin Ryabitsev (2021).
git-fame: Pretty-print Git repository collaborators sorted by contributions (includes computing code survival). Written in Python.
git-fame-rb is a command-line tool that helps you summarize and pretty-print collaborators, based on the number of contributions. The statistics are mostly based on the output of git blame (counting surviving lines). Written in Ruby.
GQL (Git Query Language) [repo] is a SQL-like language to perform queries on .git files, with support for many SQL features such as grouping, ordering and aggregation functions.
You can find more in How I Created a SQL-like Language to Run Queries on Local Git Repositories article by Amr Hesham on freeCodeCamp.
See also the following tools:
- Gitana: SQL-based Project Activity Inspector (repo archived in 2022), first mentioned in Git Rev News Edition #7.
- gitbase: SQL interface to Git repositories, written in Go; (last release from 2019, homepage is not working), first mentioned in Git Rev News Edition #48.
- git-history is a tool for analyzing Git history using SQLite (last release in 2021), first mentioned in Git Rev News Edition #82.
- MergeStat enables SQL queries for data in Git repositories (and related sources, such as the GitHub API). There is also the mergestat-lite command line tool, which runs SQL queries against local Git repositories. First mentioned in Git Rev News Edition #82. Actively developed, mergestat-lite is written in Go.
GibleFS is a toy project that maps a Git repository to a virtual filesystem, which then can be used to access the repository at any given commit. Written in Rust, does not seem to be actively developed.
The Git/fs binary in Git9 (Git client for Plan 9 non-POSIX filesystem) serves repository history as a file system.
gitfs is a FUSE file system that fully integrates with Git. You can mount a remote repository’s branch locally, and any subsequent changes made to the files will be automatically committed to the remote. Written in Python, last release in 2019.
- Note: that is not the only project named gitfs or git-fs.
SlothFS is a FUSE filesystem that provides light-weight, lazily downloaded, read-only checkouts of manifest-based Git projects. It is intended for use with Android. Written in Go, repository archived in 2022.
GitMounter is a toy FUSE browser for Git repos based on Suffusion. Requires FUSE, libgit2, pkg-config, and Swift installed. Written in Swift.

Releases

Git 2.43.0, 2.43.0-rc2, 2.43.0-rc1, 2.43.0-rc0, 2.42.1
Git for Windows 2.43.0(1), 2.43.0-rc2(1), 2.43.0-rc1(1), 2.43.0-rc0(1)
GitLab 16.6, 16.5.2, 16.5.1, 16.4.2, 16.3.6
Gerrit Code Review 3.6.8, 3.7.6, 3.8.3, 3.9.0-rc6
GitHub Enterprise 3.11.0
GitKraken 9.10.0
GitHub Desktop 3.3.5
Tower for Windows 5.2
Tower for Mac 10.2

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Alexander Shopov, Luca Milanesio, Bruno Brito, and Štěpán Němec.