Git Rev News Edition 107 (January 31st, 2024)

Git Rev News: Edition 107 (January 31st, 2024)

Welcome to the 107th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the months of December 2023 and January 2024.

Discussions

Support

Git Rename Detection Bug

Jeremy Pridmore reported an issue to the Git mailing list. He used git bugreport, so his message looks like a filled out form with questions and answers.

He was trying to cherry-pick changes from one repo (A) to another (B), while both A and B came from the same original TFS server but with different set of changes. He was disappointed though because some files that had been moved in repo A were matched up by the rename detection mechanism to files other than what he expected in repo B, and he wondered if the reason for this was the new ‘ort’ merge strategy described in a blog post by Elijah Newren.

While not obvious at first, Jeremy’s primary problem specifically centered around cases where there were multiple files with 100% identical content. For example, originally there could have been an orig/foo.txt file, while one of the descendant repos does not have that file anymore but instead has two files, dir2/foo.txt and dir3/foo.txt, both with contents identical to the original orig/foo.txt. So, Git has to figure out which one of dir2/foo.txt and dir3/foo.txt is the result of renaming orig/foo.txt.

Elijah replied to Jeremy explaining extensively how rename detection works in Git. Elijah pointed out that Jeremy’s problem, as described, did not involve directory rename detection (despite looking kind of like a directory rename detection problem). Also, since Jeremy pointed out that the contents of the “misdetected” renames had identical contents to what they were paired with, that meant that only exact renames were involved. Because of these two factors, Elijah said that the new ‘ort’ merge strategy, which he implemented, and which replaced the old ‘recursive’ strategy, should use the same rename detection rules as that old strategy for Jeremy’s problem. Elijah suggested adding the -s recursive option to the cherry-pick command to verify this and check if it worked differently using the old ‘recursive’ strategy.

Elijah also pointed out that for exact renames in a setup like this, other than Git giving a preference to files with the same basename, if there are multiple choices with identical content then it will just pick one essentially at random.

Jeremy replied to Elijah saying that this sounded like what he was observing. He gave some more examples, showing that when there are multiple 100% matches, Git didn’t always match up the files that he wanted but matched files differently. Jeremy suggested that filename similarity (beyond just basename matching) be added as a secondary criteria to content similarity for rename detection, since it would help in his case.

Elijah replied that he had tried a few filename similarity ideas, and added a “same basename” criteria for inexact renames in the ort merge strategy along these lines. However, he said other filename similarity measurements he tried didn’t work out so well. He mentioned that they risk being repository-specific (in a way where they help with merges in some repositories but actually hurt in others). He also mentioned a rather counter-intuitive result that filename comparisons could rival the cost of content comparisons, which means such measurements could adversely affect performance and possibly even throw a monkey wrench in multiple of the existing performance optimizations in the current merge algorithm.

The thread also involved additional explanations about various facts involving rename detection. This included details about how renames are just a hint for developers as they are not recorded, but are instead computed from scratch in response to user commands. It also included details about what things like “added by both” means (namely that both sides added the same filename but with different contents), why you never see “deleted by both” as a conflict status (there is no conflict; the file can just be deleted), and other minor points.

Elijah also brought up a slightly more common case that mirrors the problems Jeremy saw, where users could be surprised by the per-file content similarity matching that Git does. This more general case arises from having multiple copies of a versioned library. For example, you may have a “base” version with a directory named “library-x-1.7/”, and a “stable” version has many changes in that directory, while a “development” branch has removed that directory but has added both a “library-x-1.8/” and a “library-x-1.9/” directory which both have changes compared to “library-x-1.7/”. In such a case, if you are trying to cherry-pick a commit involving several files modified under “library-x-1.7/”, where do the changes get applied? Some users might expect the changes in that commit to get applied to “library-x-1.8/”, while others might expect them to get applied to “library-x-1.9/”. In practice, though, it would not be uncommon for Git to apply the changes from some of the files in the commit to “library-x-1.8/” and changes from other files in the commit to “library-x-1.9/”. Elijah explained why this happens and suggested a hack for users dealing with this particular kind of case to work around rename detection.

Philip Oakley then chimed into the discussion to suggest using “BLOBSAME” for exact renames in the same way as “TREESAME” is used in git log for history simplification. Elijah replied to Philip that he thinks that ‘exact rename’ already works. Junio C Hamano, the Git maintainer, then pointed out that “TREESAME” is a property of commits, not trees, and suggested using words other than “BLOBSAME” and “TREESAME” in the context of rename detection.

Philip and Elijah discussed terminology at more length, agreeing that good terminology can sometimes help people coming from an “old centralised VCS” make the mind shift to understand Git’s model, but didn’t find anything that would help in this case.

Finally, Philip requested more information about how Git computes file content similarity (for inexact rename detection), referencing Elijah’s mention of “spanhash representation”. Elijah explained the internal data structure in detail, and supported his earlier claim that “comparison of filenames can rival the cost of file content similarity computations”.

Other News

Various

The contributions GitLab’s Git team made to the Git 2.43 release by John Cai on GitLab Blog.
- See also Highlights from Git 2.43 by Taylor Blau on GitHub Blog, covering different changes, included in Git Rev News Edition #105.
GitHub has Copilot, GitLab has Duo Code Suggestions; now Bitbucket has integration with Tabnine: Accelerate your development process with Tabnine AI and Bitbucket.

Light reading

I Taught GIT to High School Students: My Experience as Linux Day Mentor by Coluzzi Andrea on his blog (and also on DEV.to).
How Framer Manages Their Codebase with Tower by Bruno Brito on Tower’s blog.
Julia Evans continues her series of articles about Git with Do we think of git commits as diffs, snapshots, and/or histories? and Inside .git (the latter in both a comic and a text version).
Minimal contents of a .git folder by Manuel Strehl on A Peculiar Zoo of Thoughts blog.
Git Config Settings I Always Recommend by Brandon Pugh on DEV.to (and also on his blog); though setting pull.rebase to true depends on whether project prefers merges or rebases, and is very project-dependent.
Git Lesson: How to Use .gitignore and .gitkeep? by Rita {FlyNerd} Lyczywek on DEV.to (translated from original article in Polish).
Git Prom! My Favorite Git Alias (to fetch the latest upstream HEAD and rebase your current branch on top of it) by Matt Butcher on DEV.to.
Integrating DVC and Git LFS via libgit2 filters by Peter Rowlands on DVC AI Blog. DVC (Data Version Control) was first mentioned in Git Rev News Edition #42, with links to different articles about it in #42, #63, #64, #72, and #100.
Version Control for Machine Learning by Nikitha Narendra on DagsHub Blog. The DAGsHub service was first mentioned in Git Rev News Edition #72; further articles about this web platform for data version control linked in Edition #85, #96, and #97.
RFC: Bridging GitHub workflows with b4 by Konstantin Ryabitsev on Linux kernel tools mailing list via lore.kernel.org.
Jujutsu: a new, Git-compatible version control system by Daroc Alden on LWN.net (free link). Jujutsu was first mentioned in Git Rev News Edition #85; there was also a Jujutsu: A Git-Compatible VCS talk by Martin von Zweigbergk at Git Merge 2022, mentioned in passing in Git Rev News Edition #91.
Praise, Criticism, and Dialogue (in open source code review process) by Robert Haas (PostgreSQL contributor) on his Blogspot blog.
Being friendly: friendly forks 101 and Being friendly: Strategies for friendly fork management by Lessley Dennington on GitHub Blog (2022).

Git tools and sites

Git-RDM had intended to be a Research Data Management (RDM) plugin for the Git version control system. It interfaces Git with data hosting services to manage the curation of version controlled files using persistent, citable repositories. Access to hosting services is managed with PyRDM library, which supports Figshare, Zenodo, and (in a limited fashion) DSpace-based services using SWORD protocol version 2. Written in Python, last released in 2016.
- See also the “Git-RDM: A research data management plugin for the Git version control system” article in The Journal of Open Source Software (2016).
GitVision is a web tool designed to visualize Git repositories in virtual, augmented, and 3D reality. Developed with Vue 3 in Vite by Kacper Konecki (GaspardIV). There is a live demo of GitVision at gitvis.web.app, including quite a few tiny, small, medium and large example repositories; you can also visualize your own repository by uploading data prepared using GitVision script (or you can use the tool locally).
- It provides a type of 3D visualization different from the much better known Gource visualization tool for source control repositories. There the repository is displayed as a tree where the root of the repository is the center, directories are branches and files are leaves. Contributors to the source code appear and disappear as they contribute to specific files and directories.
- Has different purpose than Git History.xyz web app that allows to quickly browse the history of files in any git repo, mentioned in Git Rev News Edition #48 and #105.
- See also the VR-Git: Git Repository Visualization and Immersion in Virtual Reality (PDF) paper by Roy Oberhauser (2022).
The Visualize Git web app illustrates what’s going on under the hood when you use common Git operations. You’ll see what exactly is happening to your commit graph. Powered by D3. Sources on GitHub as git-school/visualizing-git. This app is quite similar to the free playground mode of Visualizing Git Concepts with D3, first mentioned in Git Rev News #69. Compare with:
- Learn Git Branching, mentioned first in Git Rev News Edition #30.
- Git Gud, a visual web-based Git simulator, meant to help understand Git better, announced by its author Nic Hartley in Git Gud at git. First mentioned in Git Rev News Edition #48.
- Git Gud, a command line game designed to help you learn how to use the Git version control system. Written in Python by Ben Thayer. First mentioned in Git Rev News Edition #72.
- Oh My Git!, an open source game about learning Git, written using the Godot game engine (source). There was a lightning talk about this game at FOSDEM 2021: Building a Git learning game: A playful approach to version control. First mentioned in Git Rev News Edition #72.
- Git-Sim tool (written in Python) to visually simulate Git operations in your own repos with a single terminal command. Described in Git-Sim: Visually Simulate Git Operations In Your Own Repos (mentioned in Git Rev News Edition #95) and Git-Sim 3 Month Dev Update: Community Response, New Features, & The Future (mentioned in Edition #98).
List of git mistakes people have listed on Mastodon, gathered by Julia Evans (@b0rk@jvns.ca).

Releases

GitHub Enterprise 3.11.3, 3.10.5, 3.9.8, 3.8.13
GitLab 16.8.1, 16.7.4, 16.6.6, 16.5.8, 16.8, 16.7.3, 16.7.2, 16.6.4, 16.5.6
Bitbucket Server 8.17
GitKraken 9.11.1
GitHub Desktop 3.3.8, 3.3.7
Tower for Mac 10.3
Tower for Windows 5.5

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Elijah Newren, Bruno Brito, Brandon Pugh and Štěpán Němec.