Welcome to the 99th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the months of April 2023 and May 2023.
To help us improve Git Rev News, please participate in our first Reader Survey. It’s up only until our next edition, so for about one month.
Weird behavior of git log --before
or git log --date-order
Thomas Bock reported an issue in
a LibreOffice repository
where some commits from around 2010 were treated by git log
as if
they had been created before 1980.
Commands like git log --before="1980-01-01"
or git log
--date-order
both show or list some commits with an author date and
a commit date from around 2010 as if they were from before 1980.
Thomas looked at the timestamps of the author and committer dates in these commits, but they didn’t appear to be broken, so he suspected a Git bug.
Peff, alias Jeff King, thanked Thomas “for providing a clear example and reproduction recipe” and pointed out that the commits that appeared to be from before 1980 were “malformed, but only slightly”. It appeared that their “author” and “committer” headers contained something like:
Firstname Lastname<firstname.lastname <Firstname Lastname<firstname.lastname@example.com>> 1297247749 +0100
instead of simply:
Firstname Lastname <firstname.lastname@example.com> 1297247749 +0100
that is, with an extra weird set of angle brackets.
Peff also found that there were two different code paths for commit
parsing and they behaved differently when there was an extra set of
angle brackets. One, which was used to fill in the fields of a
struct commit
, only parsed the “parents”, “tree”, and “committer
timestamp” fields. For that last field, it was using the
parse_commit_date()
function which stopped at the first ‘>’ and then
tried to parse the rest of the line as a timestamp, which failed and
returned a 0 timestamp if there was a second ‘>’.
The other code path, used when the commit was displayed, called the
split_ident_line()
function to parse the “author” and “committer”
headers, but this function was trying to find the last ‘>’ in these
headers instead of the first one, which yielded the correct timestamp
when there were two or more ‘>’.
Peff then suggested a patch to make parse_commit_date()
behave like
split_ident_line()
and find the last ‘>’ instead of the first
one. He also discussed other possible ways to fix the issue,
including doing nothing as the commits were indeed malformed.
Kristoffer Haugsbakk replied to Peff saying he was using a tool
called git repair
to try to fix
the original repo. But Peff said he wasn’t sure git repair
would be
able to fix it. He mentioned that
git filter-repo
or other
tools would be able to fix it, but would require the commit history
to be rewritten, which might not be “worth it for a minor problem
like this”.
Kristoffer replied that he gave up with git repair
as it didn’t
seem to finish, but was actually more interested in seeing if the
weird git log
behavior went away to convince others it wasn’t a
bug, rather than fixing the repo.
Peff suggested carrying on with git-filter-repo’s
--commit-callback
option, or alternatively piping git
fast-export
through sed
, and then back to git fast-import
, as he
was almost certain git log
would properly work if the repo was
fixed.
A few weeks later Kristoffer sent the URL of a repaired repo. He
said he couldn’t use git filter-repo
, but “git commit-tree
+
git replace
+ git filter-repo --force
worked”.
In the meantime, Junio Hamano, the Git maintainer, replied to Peff’s initial findings wondering which commit parsing function was used to populate the commit-graph files where commit data is cached, as it wouldn’t be good to record broken timestamps there.
Peff replied to Junio saying the commit-graph files are written from the parsed “struct commit” objects which is good as we want those cache files to always match the code that is used when they are not available. If Peff’s patch was applied to fix the parsing though, that would mean that existing commit-graph files would need to be manually removed, so that the fixed parsing could be used instead of broken values stored in those files.
Peff also discussed modifying the commit-graph code so that when a 0 timestamp was recorded for a commit, this commit would be parsed again, but thought it might not be worth the effort. Derrick Stolee discussed this idea too, but agreed with Peff saying “this seems like quite a big hammer for a small case”.
Thomas then thanked everyone for “clarifying this mystery” as the explanations given “already helped a lot”. He said that it would be very useful to fix the parsing of the broken commits, but, if that was considered to be too small a problem, he would like some kind of error handling to be introduced for commits with 0 timestamps instead of them being listed in the wrong time period.
Peff then sent
a first version of a small patch series
to properly fix the parsing of the broken commits and to fix another
parsing bug he found in the same parse_commit_date()
function.
Junio reviewed Peff’s patches and made a few suggestions, mostly about code comments. Peff took them into account and sent a version 2 of his patch series which behaved in the same way as the previous one, but had improved code comments.
Phillip Wood then wondered if it would be better to not use
strtoumax
(3) to parse timestamps as this standard C library function
is using the standard isspace
(3) while we are using our own version
of isspace
(3) which is different. Possible issues with strtoumax(3)
could also be related to different characters being considered
digits than in our code. This kind of issues come from the fact that
strtoumax
(3), like many other standard C library functions, is taking
the current
locale
into account.
After some discussions between Peff, Phillip and Junio, Peff sent
a version 3 of his patch series
with small changes. Especially the new version makes sure Git rejects
timestamps that start with a character that we don’t consider a
whitespace or a digit or the ‘-‘ character before using strtoumax
(3)
as this was considered enough to avoid issues related to this
function.
Phillip, Junio and Peff discussed this version a little bit more but found it good, so it was merged and these changes will be in Git v2.41.0 which will be released soon.
Various
Light reading
Easy watching and listening
Git tools and sites
This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Bruno Brito.