Git Rev News: Edition 136 (June 30th, 2026)

Welcome to the 136th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the months of May and June 2026.

Discussions

Reviews

  • [PATCH 0/3] Batch prefetching

    Elijah Newren sent a 3 patch series to improve the performance of a couple of commands in partial clones. The work was spurred by a real-world report where git cherry jobs were each doing hundreds of single-blob fetches, at a cost of around 3 seconds each, so that batching those downloads should dramatically speed up such jobs. As Elijah put it, he “decided to fix up git grep similarly while at it”. The series also corrected a small documentation typo he had noticed in patch-ids.h (a missing trailing parenthesis in a comment), as a preparatory fixup.

    For readers unfamiliar with the trade-off, partial clones let users avoid downloading blobs upfront, at the expense of needing to download them later as they run other commands. That trade-off can sometimes be more painful than expected: when the needed blobs are discovered one at a time as they are accessed, each one triggers a separate network round-trip. Some commands like checkout, diff, and merge already mitigate this by doing batch prefetches of the blobs they will need, which dramatically reduces the cost of on-demand loading. The aim of this series was to extend that ability to two more commands, git cherry and git grep.

    The interesting part for git cherry is how to figure out, without fetching anything yet, which blobs will eventually be needed. As Elijah explained, git cherry works in two phases: it first computes header-only patch IDs (based on file paths and modes), and only falls back to full content-based IDs when the header-only IDs collide. Those full IDs are what requires reading blob content, and the comparison is driven by a hashmap whose comparison function, patch_id_neq(), is exactly what triggers the on-demand fetches. To enumerate the colliding blobs ahead of time, the patch temporarily swaps the hashmap’s comparison function for a trivial always_match() function, walks the entries that would collide to collect their blob OIDs into an oidset, restores the original comparison function, and then fetches everything in a single batch via promisor_remote_get_direct(). A helper, collect_diff_blob_oids(), lists the blob OIDs touched by a commit’s diff. It leaves out files that are explicitly marked as binary in the userdiff configuration, because for those files the patch_id just hashes the OID with oid_to_hex() instead of reading the blob, so there is no point downloading them.

    While git cherry relies on hashmap comparisons, the git grep patch takes an analogous but simpler approach: it adds a preliminary walk over the tree (similar to grep_tree()) that collects the blobs of interest and prefetches them in one go.

    Junio Hamano, the Git maintainer, took a first look and immediately spotted something that did not belong: the series added a 210-line investigations/cherry-prefetch-design-spec.md file to the project. He pointed out that, as a document describing how git cherry works, it was “vastly lacking”, that much of its content was the sort of material that would normally go into a commit message, and that he was “not sure how others would benefit from being able to read it” once the series landed. Elijah’s reply was short and to the point: “Ugh, no, sorry.” That stray file had been committed by mistake.

    Elijah quickly sent version 2, whose only change compared to v1 was to remove that stray file, noting it was “So embarrassing that I didn’t catch that before submitting.”

    Phillip Wood reviewed v2 and made an interesting connection: git rebase without --reapply-cherry-picks suffers from the same problem, since it does the equivalent of git log --cherry-pick. He asked whether prefetch_cherry_blobs() could be shared with the cherry-pick detection in revision.c. Elijah agreed the connection was correct, explaining that git rebase (without --reapply-cherry-picks) and git log --cherry-pick both go through cherry_pick_list() in revision.c, which has the same shape as the loop in cmd_cherry() and triggers fetches from the same patch_id_neq() callback. He even sketched what sharing the code would look like.

    However, he preferred to leave that out of the current series, expressing reservations about expanding partial-clone support further into this area: git cherry, git log --cherry-pick, and the default cherry-pick detection in git rebase all exist to answer “has this patch already landed upstream?”, a question that, in repositories large enough to need partial clones, he felt “is rarely worth the cost of computing patch-ids across arbitrary amounts of history.” His honest guidance for users on a large repository would be to pass --reapply-cherry-picks (with rebase) and skip the detection entirely, or to narrow the range under consideration. He noted that the omission of a --no-reapply-cherry-picks option in git replay had been a deliberate choice rather than an oversight. He had only implemented the git cherry fix because of a specific customer whose tooling had already baked in the operation, and prefetching at least made the worst case tolerable. He added that he would happily review a patch from anyone wanting to carry the shared code forward.

    Phillip continued the exchange with several good questions, asking whether patch IDs are computed for every upstream commit or just the ones modifying the same paths, and remarking that it “is a shame that we don’t have a config setting for --reapply-cherry-picks as it is easy to forget to pass that option” (a setting made awkward because the apply backend does not support that option). He was also “a bit surprised customers aren’t complaining about tools that use git rebase being slow.”

    Elijah replied that determining which upstream commits modify the same paths still requires walking the upstream commits and doing a tree-diff for each, and that in the biggest repositories “even a merge-base operation can start to feel expensive.” On the surprise about rebase, he answered “Are you sure they aren’t complaining?”, explaining that the merging parts of a rebase already do batch prefetching, but the cherry-pick-detection part does not. He also noted that the customer in question was using git replay rather than git rebase, probably because early versions of git replay lacked the drop-commits-that-become-empty logic that Phillip later added (he thanked Phillip again for that), and that the prefetch patch lets things stay fast even if they keep their git cherry calls.

    Derrick Stolee then reviewed v2, reading both the git cherry and git grep patches together. He worried that collect_diff_blob_oids() being “hidden in builtin/log.c may not be the right long-term home”, anticipating more and more cases where Git would want to prefetch blobs, and wondered whether the logic could take advantage of, or live alongside, the existing diff_queued_diff_prefetch() within diffcore_std() in diff.c. He framed the git cherry patch as caring about a diff and the git grep patch as caring about a “scan prep”, suggesting git archive as a closer analog for the latter than checkout. He was careful to add that he did not mean to complicate the series and was “most interested in having this logic be more reusable in the future without needing to move code across files.”

    Junio, seeing that Stolee’s two review messages had gone unanswered for a while, asked whether he should keep the patches in his tree “hoping that responses may come some day”, and said he would mark the topic as expecting review responses in the draft “What’s cooking” report for the time being. Elijah apologized for the delay, explaining he had been pulled into firefighting and remediation duties after a number of incidents at work, and suggested marking the series as expecting a re-roll since Stolee had asked for an additional test.

    Elijah then answered Stolee’s reusability question in detail. He read the patch differently: collect_diff_blob_oids() already leans on the diff library at the per-commit level (diff_tree_oid() plus diffcore_std()), and the real value of the series lives above the diff library, in the accumulation across many commits.

    Concretely, the motivating case was a patch touching a few files where upstream had tens of thousands of commits in the relevant range, several hundred of which modified the same set of files: a per-diff prefetch like diff.c uses would turn that into hundreds of small fetches, “what this series gives you is one fetch.” He pointed out two further git cherry-specific filters that he felt did not belong in the diff library: most commits are skipped before patch-ID is even computed (so prefetching for them would be wasted), and content for binary files is skipped because patch-ID uses oid_to_hex() for them. To check Stolee’s idea concretely, he reviewed all of the existing promisor_remote_get_direct() call sites and concluded that none of them shared the “diff two trees and harvest OIDs” shape, so there was no natural shared layer above the promisor_remote_get_direct() primitive itself. He agreed git archive would be the closest analog if it ever grew prefetch logic, and proposed factoring out a tree-walk helper only when a second caller actually wanted one.

    For the git grep patch, Stolee asked for a test that exercises a pathspec filter, with files like matches.txt, nomatch.txt, and matches.md, so that git grep -c "needle" HEAD -- *.txt would download only the matching subset. This turned out to be more valuable than a simple test improvement: Elijah replied “Yes, absolutely”, and discovered that while he was handling pathspecs correctly, he was unconditionally requesting whatever objects matched the pathspecs even when those blobs were already present locally. He promised to send a fix along with the updated test.

    That fix arrived in version 3, which made three changes compared to v2:

    • the final patch’s test case was updated, as Stolee had suggested, to exercise a pathspec,

    • the last two patches were modified to avoid re-downloading blobs already present locally (checking with odb_read_object_info_extended() and OBJECT_INFO_FOR_PREFETCH on the git cherry side, and odb_has_object() on the git grep side), with the tests adjusted to verify it, and

    • a new first patch was inserted documenting the filtering contract of promisor_remote_get_direct().

    That documentation patch explains that the function does not filter out OIDs already present locally on its happy path, so callers are responsible for filtering and deduplicating themselves. Elijah candidly noted in the commit message that he “missed this originally and wrote two problematic callers”. He also mentioned that he had not pursued Stolee’s code-sharing suggestion, since it appeared to be based on a misunderstanding that the git cherry patch was about a diff.

    Stolee reviewed v3 and declared it “good to go”, graciously adding that Elijah’s detailed responses in the v2 thread “helped me understand that my thought was misguided” and gave him “extra confidence” in the approach. Junio agreed the series was “in a good shape” and marked the topic for the next branch. Elijah thanked Stolee one more time, noting that the comments on the git grep patch in particular “led me to what would have been a rather annoying bug”, so calling out the test improvement had been time well spent.

    In the end, the series was merged into the master branch and is part of the recent v2.55.0 release. A concrete customer pain point led to extending Git’s existing batch-prefetching habit to two more commands, git cherry and git grep, as well as a bug fix and improved documentation. The thread also clarified the boundaries of partial-clone friendliness for cherry-pick detection, leaving the door open for sharing the new code with git rebase and git log --cherry-pick, should someone wish to carry that work forward.

Other News

Events

Various

Light reading

Scientific papers

Easy watching

Git tools and sites

  • Worktrunk is a CLI for Git worktree management, designed for running AI agents in parallel. Written in Rust, dual-licensed under MIT and Apache-2.0 license.
  • nt (short for navigate tree) is a tiny zsh command for hopping around worktrees: it spins one up — or jumps to it if it already exists — cds you in, and gets out of your way. Written as a Zsh script, under MIT license.
  • treehouse is a CLI tool that helps you isolate your development environments when using Git worktrees. It assigns a stable number for each worktree, so you can use this number to derive per-worktree local configuration like ports, database names, etc., or anything you want isolated per worktree. Written in Go, under MIT license.
  • rift is an experimental alternative to Git worktrees using copy on write via reflinks or snapshots. Written in Rust, no license provided (yet).
  • gitprofile is a tool to help manage multiple Git identities — work, personal, open-source — so the right name, email, and SSH key are always used without thinking about it. It uses Git’s built-in includeIf directive. Written in Go, under MIT license.
  • hk is a Git hook manager and project linting tool with an emphasis on performance. Provides fast, powerful, and flexible hook management for modern development workflows. Written in Rust, under MIT license.
  • GH Desktop Plus is a relatively up-to-date fork of GitHub Desktop with additional features and improvements, including: searching commits by title, message, tag, or hash; support for multiple GitHub, Bitbucket & GitLab accounts; Bitbucket and GitLab integration; and more. Note that it is a community-maintained project, not an official GitHub product. It is written in TypeScript as an Electron app, under MIT license.
  • yadiff, yet another diff viewer, is a local web application built on pierrecomputer’s trees and diffs. Written in TypeScript for Node.js, under MIT license.
    Inspired by DiffsHub, which was mentioned in Git Rev News #135.
  • git-pile is a set of scripts for using a stacked-diff workflow with Git & GitHub. There are a lot of different trade-offs for how this can work; git-pile chooses to be mostly not-magical at the cost of being best at handling multiple commits that don’t conflict with each other instead of chains of pull requests affecting the same code. Written in shell and Python, under MIT license.
  • spr (Super Pull Requests) for using a stacked-diff workflow with GitHub. Written in Rust, under MIT license.
  • sem (Semantic version control) is a command line tool that adds semantic understanding of Git changes. Instead of lines changed, sem tells you what entities changed: functions, methods, classes. Provides six subcommands: diff, blame, impact, log, entities, and context. Also works outside Git for arbitrary file comparison. It parses code with tree-sitter. Helpful for working with AI agents. Written in Rust, under MIT and Apache 2.0 licenses.
    • Part of the Ataraxy Labs stack — agent-native infrastructure for software development. See also: weave (entity-level Git merge driver) · inspect (semantic code review) · opensessions (tmux sidebar for coding agents).
  • git-courer is an MCP (Model Context Protocol) server that gives AI agents a full, safe interface to Git — not just commits, but the whole surface: status, diff, branch, stash, history, and sync. Includes 13 MCP tools, with structured JSON in, and structured JSON out. Every mutation backs itself up automatically. With local Ollama — zero tokens for Git operations. Written in Go, under MIT license.
  • repo-slopscore is a CLI + web app which gives a “slop score” for any public Git repository resolvable via https://. It goes through the entire commit history of a repository (upper limit is 5000 commits currently) and detects visible signs of AI/LLM tool usage in the commit history and the source tree. Aggressive caching is used to ensure that a repo that has been analyzed before does not need to get fully cloned again. Written in Rust, under Mozilla Public License 2.0. Used by https://slopscan.ava.pet/.
  • Grit is a “from-scratch”, library-based, memory-safe, idiomatic Rust reimplementation of Git (created with help of AI agents) that passes over 99% of the entire Git test suite. The grit-lib library is licensed under the MIT License, while the grit-git binary crate is licensed under GPLv2 (like Git).
  • Flow Simulator by Mainline is a web app where you can watch the simulation on how the code flows from idea to production under three branching strategies: GitHub flow, Git flow, and trunk-based. You can switch modes to compare.
  • Commit Crimes is a joke web app, where you can paste any GitHub handle; the app will then pull their permanent record, book the user for crimes against version control (e.g. unprotected pushes straight to ‘main’), and hand down the sentence.

  • jj_tui is a TUI for the Jujutsu version control system, with focus on performance, interactivity, and being intuive. Written in OCaml, under MIT license.
  • Irmin is an OCaml library for building mergeable, branchable distributed data stores; a distributed database built on the same principles as Git. Under ISC license.

Releases

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Toon Claes, Štěpán Němec and Paulo Gomes.