Welcome to the 65th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the month of June 2020.
The history of master
in Git (written by Andrew Ardill)
Amidst all the discussion
around changing the default branch from master
to something else,
many people have asked why master
was chosen in the first place.
As master
has a few different meanings in English, just which
meaning was intended?
Konstantin Ryabitsev
was the first to discuss the meaning of master
, saying
Git doesn’t use “master-slave” terminology – the “master” comes from the concept of having a “master” from which copies (branches) are made
This post from the GNOME mailing list was then linked by Simon Pieters with the claim that
Git’s
master
is in fact a reference to master/slave
That post points out that the first use of master
was in
a CVS helper script,
links that to BitKeeper (the version control system used to manage
Linux development when Linus Torvalds first wrote Git), and claims
BitKeeper used the “master and slave” meaning of master
.
Many people considered master
to mean a “master copy”, so this
connection to slavery was very surprising.
Andrew Ardill investigated the BitKeeper source code and came to the conclusion that “the overwhelming majority of [the usages of master in BitKeeper] are of the “Master Copy” variant”, or as Michal Suchánek said “even in BitKeeper the use of master/slave is the exception rather than the norm.”
Off the list discussions were ongoing, and
Petr Baudis wrote on Twitter about naming the master
branch in Git
I picked the names “master” (and “origin”) in the early Git tooling back in 2005.
(this probably means you shouldn’t give much weight to my name preferences :) )
I have wished many times I would have named them “main” (and “upstream”) instead.
Glad it’s happening @natfriedman
When asked for which meaning of master
was intended,
Petr replied
“master” as in e.g. “master recording”. Perhaps you could say the original, but viewed from the production process perspective.
A clueless Central European youngster whose command of English was mostly illusory came up with the term, which is why it isn’t very obvious…
In a follow-up to that original GNOME mailing list post, Bastien Nocera retracted their claims from the original post, saying
I emailed Linus Torvalds recently… and he told me that it was unlikely that the “Git master” branch name was influenced by BitKeeper, and that “master” was “fairly standard naming” for this sort of thing and “more likely to be influenced by the CVS master repository”
Going on, Bastien discusses Petr Baudis’ tweets and then concludes “it doesn’t matter where the name comes from… The fact that it has bad connotations, or inspires dread for individuals and whole communities, is reason enough to change it.”
This is something that Brian M. Carlson had also pointed out on the Git mailing list, saying
“master”, even though from a different origin, brings the idea of human bondage and suffering to mind for a non-trivial number of people, which of course was not the intention and is undesirable. I suspect if we were making the decision today, we’d pick another name, since that’s not what we want people to think of when they use Git.
Brian goes on to lay out changes required in Git to rename master
as the default, suggesting that there is a decent amount of work and
that due to compatibility concerns “we’d probably want to make it in
a [Git version] 3.0”.
Around the web the discussion about renaming master
continues.
The incorrect claims around the history of master
persist, even in
our own Git Rev News: Edition 64,
but seem to be quickly corrected where possible such as on
GitLab’s discussion on the topic.
More commit-graph/Bloom filter improvements
Derrick Stolee, who prefers to be called Stolee, sent a patch series to the mailing list, based on a previous experimental patch series sent a few weeks earlier by Gábor Szeder.
When he sent his patch series, Gábor said that his work was a proof of concept started more than a year ago, that he had no time to finish yet. He was motivated to send it as-is with changes to commit messages, when he recently took a look at the current changed-path Bloom filter implementation. This implementation was developed for a long time mainly by Garima Singh and was merged at the beginning of May. He saw that it had some of the same issues that he had stumbled upon, and that it missed some optimization opportunities.
Gábor listed a lot of very interesting benefits from his work, but also a lot of drawbacks that would prevent it from being merged as is. Many of the benefits are linked to a new format used to store the changed-path Bloom filter. This new format was justified by an impressive commit message.
Stolee, Taylor Blau, Johannes Schindelin and Junio Hamano, when reviewing Gábor’s work, were disappointed that Gábor was not trying to contribute to the current implementation. It appeared though that a number of Gábor’s 34 patches and ideas could be applied on top of the current implementation.
That’s what Stolee did by first sending 10 patches from Gábor’s series at the beginning of June. This patches series required a bit of work, but Stolee left out what would have been more difficult to apply to the current code. René Scharfe, Stolee, Gábor and Junio commented a bit on it, but didn’t find anything that would require a new version of this patch series. So it is now “cooking” in the ‘next’ branch.
Stolee’s next patch series called “More commit-graph/Bloom filter improvements” was about adding a few extra improvements, several of which are rooted in Gábor’s original series. Even though Gábor’s patches did not apply or cherry-pick at all, Stolee still credited Gábor as the author of 4 patches out of 8.
Anyway this new series contained 2 changes that improve the false-positive rate which increases performance, and one change that improves usability. René and Taylor suggested improvements and bug fixes. Taylor even sent a patch.
Stolee then sent a version 2 of the series, taking into account the feedback and adding the patch from Taylor to the series. René, Gábor, Junio and Stolee discussed a few more points.
That led to Stolee sending a version 3 in which Gábor reported a bug that Stolee subsequently fixed.
So Stolee sent a version 4 which is now cooking in the ‘next’ branch, along with the first series that has 10 patches from Gábor.
In the meantime though Gábor commented on this first series saying that it has a number of issues. Hopefully these issues will be addressed soon, and these 2 patch series will be merged in the near future.
Who are you and what do you do?
I’m a Software Engineer at Google who works on Git. I also contribute to JGit (a Java implementation of Git) as one of its committers.
What would you name your most important contribution to Git?
I would say “partial clone” - the ability to clone a repository, but not necessarily have all of that repository’s objects (accumulated throughout its history) in your clone. Quite a few articles have been written about it, but in summary, it improves Git performance especially for large repositories.
What are you doing on the Git project these days, and why?
The thing that immediately comes to mind is “partial clone”. The fundamentals are there, but some Git commands still operate under the assumption that objects are only a disk read away (instead of a network fetch - in a partial clone, if an object is needed but missing, it is automatically fetched). I’m improving those commands to be more cognizant of this fact - typically, this means batching the fetch of all the objects it will need once it realizes that it does not have some of them, instead of “I need this object, so go fetch it; OK let me process it; oops I need another one, so go fetch that”.
If you could get a team of expert developers to work full time on something in Git for a full year, what would it be?
Along the lines of “partial clone” and large repositories, I would
like them to investigate the feasibility of having Git servers be able
to serve results of computations (thus, not just objects). One case is
git blame
- if a Git client could ask a Git server to send the
results of such a command, it could offload most of the computation to
the server, only needing to build upon the server’s results with the
locally-created objects that the server does not know about. This is
especially useful with partial clone, because the client does not even
have most of the objects needed and would have to fetch them
otherwise.
If you could remove something from Git without worrying about backwards compatibility, what would it be?
One small thing that I can think of: remove the ability of git reset
to update the working tree and the objects staged in the index. The
git restore
command,
relatively recently introduced, does this with more beginner-friendly
parameter names (--worktree
and --staged
, respectively, instead of
the --hard
, --mixed
, and --soft
of git reset
). This change
would make it easier, for example, to read scripts written by other
people - I would no longer need to think so much about what that
reset
in the script would do.
Events
Various
Junio Hamano, the Git maintainer, has renamed the pu
branch of
git.git to seen
. This has been done to use a more meaningful name
and make room for topics from those contributors whose two-letter
name abbreviation needs to be ‘pu’. This was announced in
“What’s cooking in git.git (Jun 2020, #04; Mon, 22)”
The Git Project Leadership Committee has been briefly interviewed via email by Elizabeth Landau for an article in Wired about current changes to Git’s default name for the initial branch.
Highlights from Git 2.28
by Taylor Blau on GitHub Blog, mentioning among others init.defaultBranch
,
changed-path Bloom filters, the git bugreport
command and git log
’s new --show-pulls
option.
The Tower Git client for Windows and MacOS now supports CMD+Z for Git (a universal undo).
Exciting new updates to the Git experience in Microsoft Visual Studio 2019.
GitHub Archive Program: the journey of the world’s open source code to the Arctic by Julia Metcalf on GitHub Blog. The GitHub Archive Program along with the GitHub Arctic Code Vault were introduced at GitHub Universe 2019, and mentioned in Git Rev News #57 (November 20th, 2019).
Updating the Git protocol for SHA-256 [LWN.net] by John Coggeshall.
Light reading
broot
and meld
to diff before commit by Denys Séguret
(author of broot, which is a tool to navigate file trees).Git tools and sites
diff
replacement for all circumstances,
the goal of icdiff
is to be a tool you can reach for to get a better picture
of what changed when it’s not immediately obvious from diff
.
Docs include examples on how to integrate it with Git, Mercurial and Subversion.bash
and zsh
) that make it easier to use Git.
It integrates with your shell to give you numbered file shortcuts, a repository index
with tab completion, and a community driven collection of useful SCM functions.
SCM Breeze lives on GitHub at https://github.com/ndbroadbent/scm_breezeThis edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Andrew Ardill, Jonathan Tan, Brooke Kuhlmann, Eric Sunshine, Carlo Marcelo Arenas Belón and Gábor Szeder.