Git Rev News: Edition 99 (May 31st, 2023)

Welcome to the 99th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.

This edition covers what happened during the months of April 2023 and May 2023.

To help us improve Git Rev News, please participate in our first Reader Survey. It’s up only until our next edition, so for about one month.

Discussions

Support

  • Weird behavior of git log --before or git log --date-order

    Thomas Bock reported an issue in a LibreOffice repository where some commits from around 2010 were treated by git log as if they had been created before 1980.

    Commands like git log --before="1980-01-01" or git log --date-order both show or list some commits with an author date and a commit date from around 2010 as if they were from before 1980.

    Thomas looked at the timestamps of the author and committer dates in these commits, but they didn’t appear to be broken, so he suspected a Git bug.

    Peff, alias Jeff King, thanked Thomas “for providing a clear example and reproduction recipe” and pointed out that the commits that appeared to be from before 1980 were “malformed, but only slightly”. It appeared that their “author” and “committer” headers contained something like:

    Firstname Lastname<firstname.lastname <Firstname Lastname<firstname.lastname@example.com>> 1297247749 +0100

    instead of simply:

    Firstname Lastname <firstname.lastname@example.com> 1297247749 +0100

    that is, with an extra weird set of angle brackets.

    Peff also found that there were two different code paths for commit parsing and they behaved differently when there was an extra set of angle brackets. One, which was used to fill in the fields of a struct commit, only parsed the “parents”, “tree”, and “committer timestamp” fields. For that last field, it was using the parse_commit_date() function which stopped at the first ‘>’ and then tried to parse the rest of the line as a timestamp, which failed and returned a 0 timestamp if there was a second ‘>’.

    The other code path, used when the commit was displayed, called the split_ident_line() function to parse the “author” and “committer” headers, but this function was trying to find the last ‘>’ in these headers instead of the first one, which yielded the correct timestamp when there were two or more ‘>’.

    Peff then suggested a patch to make parse_commit_date() behave like split_ident_line() and find the last ‘>’ instead of the first one. He also discussed other possible ways to fix the issue, including doing nothing as the commits were indeed malformed.

    Kristoffer Haugsbakk replied to Peff saying he was using a tool called git repair to try to fix the original repo. But Peff said he wasn’t sure git repair would be able to fix it. He mentioned that git filter-repo or other tools would be able to fix it, but would require the commit history to be rewritten, which might not be “worth it for a minor problem like this”.

    Kristoffer replied that he gave up with git repair as it didn’t seem to finish, but was actually more interested in seeing if the weird git log behavior went away to convince others it wasn’t a bug, rather than fixing the repo.

    Peff suggested carrying on with git-filter-repo’s --commit-callback option, or alternatively piping git fast-export through sed, and then back to git fast-import, as he was almost certain git log would properly work if the repo was fixed.

    A few weeks later Kristoffer sent the URL of a repaired repo. He said he couldn’t use git filter-repo, but “git commit-tree + git replace + git filter-repo --force worked”.

    In the meantime, Junio Hamano, the Git maintainer, replied to Peff’s initial findings wondering which commit parsing function was used to populate the commit-graph files where commit data is cached, as it wouldn’t be good to record broken timestamps there.

    Peff replied to Junio saying the commit-graph files are written from the parsed “struct commit” objects which is good as we want those cache files to always match the code that is used when they are not available. If Peff’s patch was applied to fix the parsing though, that would mean that existing commit-graph files would need to be manually removed, so that the fixed parsing could be used instead of broken values stored in those files.

    Peff also discussed modifying the commit-graph code so that when a 0 timestamp was recorded for a commit, this commit would be parsed again, but thought it might not be worth the effort. Derrick Stolee discussed this idea too, but agreed with Peff saying “this seems like quite a big hammer for a small case”.

    Thomas then thanked everyone for “clarifying this mystery” as the explanations given “already helped a lot”. He said that it would be very useful to fix the parsing of the broken commits, but, if that was considered to be too small a problem, he would like some kind of error handling to be introduced for commits with 0 timestamps instead of them being listed in the wrong time period.

    Peff then sent a first version of a small patch series to properly fix the parsing of the broken commits and to fix another parsing bug he found in the same parse_commit_date() function.

    Junio reviewed Peff’s patches and made a few suggestions, mostly about code comments. Peff took them into account and sent a version 2 of his patch series which behaved in the same way as the previous one, but had improved code comments.

    Phillip Wood then wondered if it would be better to not use strtoumax(3) to parse timestamps as this standard C library function is using the standard isspace(3) while we are using our own version of isspace(3) which is different. Possible issues with strtoumax(3) could also be related to different characters being considered digits than in our code. This kind of issues come from the fact that strtoumax(3), like many other standard C library functions, is taking the current locale into account.

    After some discussions between Peff, Phillip and Junio, Peff sent a version 3 of his patch series with small changes. Especially the new version makes sure Git rejects timestamps that start with a character that we don’t consider a whitespace or a digit or the ‘-‘ character before using strtoumax(3) as this was considered enough to avoid issues related to this function.

    Phillip, Junio and Peff discussed this version a little bit more but found it good, so it was merged and these changes will be in Git v2.41.0 which will be released soon.

Releases

Other News

Various

Light reading

Easy watching and listening

Git tools and sites

  • Bytebase - database schema change and version control (the GitLab for Database DevOps): web-based collaboration workspace to help DBAs and developers manage the database development lifecycle.
  • Quilt Data is a self-organizing data hub, consisting of a Python API, web catalog, and backend to manage data sets in AWS S3. The backend service is based on open-source Quilt Python package (documentation. The development of jupyterlab-quilt extension seems to be stalled, though.
  • GitOps Principles v0.1.0 published by OpenGitOps.
  • GIT Web Terminal (Git in your browser) was created using isomorphic-git (a pure JavaScript implementation of git for node and browsers). Source code on GitHub: jcubic/git.
  • schemadiff is a declarative, programmatic library in Vitess that can produce a diff in SQL format of two entities: tables, views, or full blown database schemas.
    • Compare sqldiff.exe, which is a command-line utility program (Windows binary) that displays content differences between two SQLite databases. Mentioned in Git Rev News Edition #87.

Credits

This edition of Git Rev News was curated by Christian Couder <christian.couder@gmail.com>, Jakub Narębski <jnareb@gmail.com>, Markus Jansen <mja@jansen-preisler.de> and Kaartic Sivaraam <kaartic.sivaraam@gmail.com> with help from Bruno Brito.