Git Rev News: Edition 105 (November 30th, 2023)

Welcome to the 105th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the months of October 2023 and November 2023.

Discussions

General

  • Git participates in Outreachy’s December 2023 to March 2024 round

    Achu Luma will work on the “Move existing tests to a unit testing framework” project. They will be mentored by Christian Couder.

    Congratulations to Luma for being selected!

    Thanks to GitLab for sponsoring this Outreachy internship! Thanks also to the other contributors who applied and worked on micro-projects, but couldn’t be selected! We hope to continue to see you in the community!

Reviews

  • [PATCH v2 0/2] Prevent re-reading 4 GiB files on every status

    In May 2022 Jason Hatton sent an email to the mailing list about the fact that any file of a size that is an exact multiple of 8GiB makes Git extremely slow on the repository.

    He said that he had already opened an issue about this on the Git for Windows issue tracker where Jason, Philip Oakley, brian m. carlson and Johannes Schindelin, alias Dscho, had already discussed the issue.

    Git uses an uint32_t type, a 32 bit long unsigned integer, for storing the file size in the index. This rolls over if the value is greater than 2 to the power 32, so with file sizes over 4GiB. When the size is exactly 4GiB or a multiple of it, like 8GiB, the rollover makes it zero.

    A zero file size in the index has a special meaning for Git, though. It tells Git that the file needs to be hashed again. Hashing a file is supposed to reset its file size in the index to a non-zero value, but with a 4GiB file size the rollover happens and the file size is still zero. So the hashing will be performed again and again by many different Git commands, making Git very slow.

    Jason proposed, as a solution to this problem, to detect when the rollover would happen, and in that case set the size to 1 instead of zero.

    Junio C Hamano, the Git maintainer, replied to Jason confirming the issue and explaining it a bit more in detail. Jason and Junio then discussed the issue a bit more, while Jason tested locally his suggested fix and proposed to send a real patch to fix the issue.

    René Scharfe then chimed into the discussion asking if a value other than one would be better and would avoid other possible issues. Philip Oakley replied to René suggesting using 0x80000000 instead of 1 when the rollover is detected. This would make it easier to detect “almost all incremental and decremental changes in file sizes”, as the file size in the index helps detecting file changes.

    Jason and Philip discussed the issue a bit more and agreed that using 0x80000000 only for exact multiples of 4GiB would likely be the best solution.

    Philip and Carlo Marcelo Arenas Belón also tried to help Jason properly submit a patch to the mailing list.

    Jason then sent a patch to the mailing list with the changes and explanation that had been discussed. Torsten Bögershausen, Philip and Junio reviewed it, and suggested some improvements. Junio especially requested some tests to be added.

    After some discussions with Jason to clarify what should be improved, Jason sent another version of his patch.

    It looked like Jason found an issue with the patch due to using 0x80000000 instead of 1. René and Philip discussed it with Jason, but there was no clear conclusion. It wasn’t even clear if there was an issue at all. But anyway the work on this stopped for more than one year.

    Fortunately a few weeks ago, brian m. carlson sent a new version of Jason’s patch along with another patch adding tests.

    These patches were reviewed by Eric Sunshine, Jeff King, alias Peff, Junio and Jason. After some discussions it appeared that the patches were good enough for Junio, so he decided to apply a small change and then merge them. This issue is therefore fixed in Git 2.43.0 released on November 20th.

Developer Spotlight: Alexander Shopov

  • Who are you and what do you do?

    I am Alexander Shopov - a backend engineer in the Amsterdam office of Uber working on money related systems. I am a long time translator of FOSS software to Bulgarian - I am coordinating translations of GNOME, Translation Project and many GNU modules. Bulgarian is an Eastern South Slavic language written in the Cyrillic alphabet.

  • What would you name your most important contribution to Git?

    I made and now maintain the Bulgarian translation of the text interface of Git, Gitk, and Git Gui.

  • What is the typical workflow of a contributor engaged in Git translation?

    There are 19 translations of the text interface of Git, and only 13 of them are above 80%, so I am not sure about “typical”. It is a fairly standard workflow for a FOSS project.

    Generally one needs to do the following:

    1. Read the translator-targeted README.md in the po directory
    2. Sync pace with the calendar of Git releases
    3. Use the l10n coordinator repository maintained by Jiang Xin who makes sure translations get integrated upstream.

    Currently the translation is a bit above 5500 messages, which is about 40k words, 250k of characters, or about 150 pages of text. It can be intimidating for a new translator. But you can definitely make it: be patient and translate some messages every release, merge, publish and repeat. Even better though harder is getting more than one person translating.

  • Do you contribute to Git in ways other than providing translation? If so, could you elaborate about them?

    Sadly not that much. On rare occasions I improve messages and mark strings for translations. Perhaps that will be the way I contribute unless I find a mentor and something that I find particularly interesting and important for me. So if anyone is willing to mentor me, especially in making large repos faster - ping me. I can be a competent tester at least.

  • If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?

    Due to its enormous success, Git is being used on humongous code bases with a crazy number of files, directories, commits and branches. Working with repos larger than 10GB can be a bit slow. Improving the experience would be a great thing.

  • If you could remove something from Git without worrying about backwards compatibility, what would it be?

    Backwards compatibility is massively important and I am thankful developers and users are all invested in this.

    If we treat this as a hypothetical question, there are 3 things to Git:

    • the command-line interface
    • the wire protocol
    • the storage format

    The command-line interface is gradually being improved. The wire protocol is also a place where there are workarounds for versioning. The storage format however is another (quite conservative and public) API. I would remove the old versions and try to design it targeting projects that are 10-100 times larger than the Linux kernel first. In for a penny, in for a pound. If we break things, let us break them so hard that bards will sing songs about us!

  • What is your favorite Git-related tool/library, outside of Git itself?

    I mainly use command line git plus gitk and git-gui. I do like using the meld diff tool when I work on translations.

  • Do you happen to have any memorable experience w.r.t. contributing to the Git project? If yes, could you share it with us?

    The initial getting to 100% translated messages was a challenge. I decided that I should translate Git around December 2013. That was around 2200 messages at that time and it took me about 3 releases of Git to reach 100%. Getting to 100% was immensely hard, rewarding and memorable. Afterwards keeping the translation at 100% was much easier.

  • Is there something you feel could be done to ease the life of translators?

    The terminology glossary of Git is much larger than 7 years ago, and we (the translators) should actually update git://repo.or.cz/git-gui.git::po/glossary and merge it in Git.

  • What is your advice for people who want to start Git development? Where and how should they start?

    I don’t know to be honest. If I knew I may have started already.

  • If there’s one tip you would like to share with other Git developers, what would it be?

    That would be the tip of master two years in the future. On a more serious note - perhaps more tools for migration out of the still existing proprietary version control systems would be helpful.

Other News

Various

Light reading

Easy watching and listening

Git tools and sites

  • gitattributes.io is a service to generate .gitattributes files, similar to gitignore.io.
  • githistory.xyz is a service that allows to quickly browse the history of files in any Git repo (from GitHub, GitLab, Bitbucket). Also available as Chrome, Firefox, and Visual Studio extensions, and as git-file-history command line tool (in Node.js). Mentioned in passing in Git Rev News Edition #48.
  • Josh Branchaud (jbranchaud) collected a list of Today I Learned (TIL) tips about Git.
  • lei is a command-line tool for importing and searching email, regardless of whether it is from a personal mailbox or a public-inbox instance, like public-inbox.org or lore.kernel.org.
    Warning: lei is still in its early stages and may destroy mail.
  • git-fame: Pretty-print Git repository collaborators sorted by contributions (includes computing code survival). Written in Python.
  • git-fame-rb is a command-line tool that helps you summarize and pretty-print collaborators, based on the number of contributions. The statistics are mostly based on the output of git blame (counting surviving lines). Written in Ruby.
  • GQL (Git Query Language) [repo] is a SQL-like language to perform queries on .git files, with support for many SQL features such as grouping, ordering and aggregation functions.
    You can find more in How I Created a SQL-like Language to Run Queries on Local Git Repositories article by Amr Hesham on freeCodeCamp.
    See also the following tools:
  • GibleFS is a toy project that maps a Git repository to a virtual filesystem, which then can be used to access the repository at any given commit. Written in Rust, does not seem to be actively developed.
  • The Git/fs binary in Git9 (Git client for Plan 9 non-POSIX filesystem) serves repository history as a file system.
  • gitfs is a FUSE file system that fully integrates with Git. You can mount a remote repository’s branch locally, and any subsequent changes made to the files will be automatically committed to the remote. Written in Python, last release in 2019.
    • Note: that is not the only project named gitfs or git-fs.
  • SlothFS is a FUSE filesystem that provides light-weight, lazily downloaded, read-only checkouts of manifest-based Git projects. It is intended for use with Android. Written in Go, repository archived in 2022.
  • GitMounter is a toy FUSE browser for Git repos based on Suffusion. Requires FUSE, libgit2, pkg-config, and Swift installed. Written in Swift.

Releases

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Alexander Shopov, Luca Milanesio, Bruno Brito, and Štěpán Němec.