Git Rev News: Edition 99 (May 31st, 2023)
Welcome to the 99th edition of Git Rev News, a digest of all things Git. For our goals, the archives, the way we work, and how to contribute or to subscribe, see the Git Rev News page on git.github.io.
This edition covers what happened during the months of April 2023 and May 2023.
To help us improve Git Rev News, please participate in our first Reader Survey. It’s up only until our next edition, so for about one month.
Weird behavior of
git log --beforeor
git log --date-order
Thomas Bock reported an issue in a LibreOffice repository where some commits from around 2010 were treated by
git logas if they had been created before 1980.
git log --before="1980-01-01"or
git log --date-orderboth show or list some commits with an author date and a commit date from around 2010 as if they were from before 1980.
Thomas looked at the timestamps of the author and committer dates in these commits, but they didn’t appear to be broken, so he suspected a Git bug.
Peff, alias Jeff King, thanked Thomas “for providing a clear example and reproduction recipe” and pointed out that the commits that appeared to be from before 1980 were “malformed, but only slightly”. It appeared that their “author” and “committer” headers contained something like:
Firstname Lastname<firstname.lastname <Firstname Lastname<email@example.com>> 1297247749 +0100
instead of simply:
Firstname Lastname <firstname.lastname@example.org> 1297247749 +0100
that is, with an extra weird set of angle brackets.
Peff also found that there were two different code paths for commit parsing and they behaved differently when there was an extra set of angle brackets. One, which was used to fill in the fields of a
struct commit, only parsed the “parents”, “tree”, and “committer timestamp” fields. For that last field, it was using the
parse_commit_date()function which stopped at the first ‘>’ and then tried to parse the rest of the line as a timestamp, which failed and returned a 0 timestamp if there was a second ‘>’.
The other code path, used when the commit was displayed, called the
split_ident_line()function to parse the “author” and “committer” headers, but this function was trying to find the last ‘>’ in these headers instead of the first one, which yielded the correct timestamp when there were two or more ‘>’.
Peff then suggested a patch to make
split_ident_line()and find the last ‘>’ instead of the first one. He also discussed other possible ways to fix the issue, including doing nothing as the commits were indeed malformed.
Kristoffer Haugsbakk replied to Peff saying he was using a tool called
git repairto try to fix the original repo. But Peff said he wasn’t sure
git repairwould be able to fix it. He mentioned that
git filter-repoor other tools would be able to fix it, but would require the commit history to be rewritten, which might not be “worth it for a minor problem like this”.
Kristoffer replied that he gave up with
git repairas it didn’t seem to finish, but was actually more interested in seeing if the weird
git logbehavior went away to convince others it wasn’t a bug, rather than fixing the repo.
Peff suggested carrying on with git-filter-repo’s
--commit-callbackoption, or alternatively piping
sed, and then back to
git fast-import, as he was almost certain
git logwould properly work if the repo was fixed.
A few weeks later Kristoffer sent the URL of a repaired repo. He said he couldn’t use
git filter-repo, but “
git filter-repo --forceworked”.
In the meantime, Junio Hamano, the Git maintainer, replied to Peff’s initial findings wondering which commit parsing function was used to populate the commit-graph files where commit data is cached, as it wouldn’t be good to record broken timestamps there.
Peff replied to Junio saying the commit-graph files are written from the parsed “struct commit” objects which is good as we want those cache files to always match the code that is used when they are not available. If Peff’s patch was applied to fix the parsing though, that would mean that existing commit-graph files would need to be manually removed, so that the fixed parsing could be used instead of broken values stored in those files.
Peff also discussed modifying the commit-graph code so that when a 0 timestamp was recorded for a commit, this commit would be parsed again, but thought it might not be worth the effort. Derrick Stolee discussed this idea too, but agreed with Peff saying “this seems like quite a big hammer for a small case”.
Thomas then thanked everyone for “clarifying this mystery” as the explanations given “already helped a lot”. He said that it would be very useful to fix the parsing of the broken commits, but, if that was considered to be too small a problem, he would like some kind of error handling to be introduced for commits with 0 timestamps instead of them being listed in the wrong time period.
Peff then sent a first version of a small patch series to properly fix the parsing of the broken commits and to fix another parsing bug he found in the same
Junio reviewed Peff’s patches and made a few suggestions, mostly about code comments. Peff took them into account and sent a version 2 of his patch series which behaved in the same way as the previous one, but had improved code comments.
Phillip Wood then wondered if it would be better to not use
strtoumax(3) to parse timestamps as this standard C library function is using the standard
isspace(3) while we are using our own version of
isspace(3) which is different. Possible issues with strtoumax(3) could also be related to different characters being considered digits than in our code. This kind of issues come from the fact that
strtoumax(3), like many other standard C library functions, is taking the current locale into account.
After some discussions between Peff, Phillip and Junio, Peff sent a version 3 of his patch series with small changes. Especially the new version makes sure Git rejects timestamps that start with a character that we don’t consider a whitespace or a digit or the ‘-‘ character before using
strtoumax(3) as this was considered enough to avoid issues related to this function.
Phillip, Junio and Peff discussed this version a little bit more but found it good, so it was merged and these changes will be in Git v2.41.0 which will be released soon.
- Git 2.41.0-rc2, 2.41.0-rc1, 2.41.0-rc0
- Git for Windows 2.41.0-rc2(1), 2.41.0-rc1(1), 2.41.0-rc0(1)
- Bitbucket Server 8.10
- Gerrit Code Review 3.5.6, 3.6.5, 3.7.3, 3.8.0
- GitHub Enterprise 3.8.3, 3.7.10, 3.6.13, 3.5.17
- GitLab 15.11.6 16.0.1, 16.0, 15.11.5, 15.11.4, 15.11.3, 15.10.7, and 15.9.8, 15.11.2, 15.10.6, and 15.9.7, 15.11.1, 15.10.5, and 15.9.6
- GitKraken 9.4.0
- GitHub Desktop 3.2.3
- Sourcetree 4.2.3
- Tower for Mac 9.3, 9.4 (9.4 blog post)
- git-credential-oauth 0.7.0
- GitHub code search is generally available
by Colin Merkel on GitHub Blog.
- See also A brief history of code search at GitHub in Git Rev News Edition #82, and The technology behind GitHub’s new code search in edition #96.
- GitHub’s New Code Search is Bad for Finding Code by Alex Ivanovs on Stackdiary (complaining about lack of sort by new).
- Modeling Git Internals in Alloy, Part 3: Operations on Blobs and Trees by Brian Hicks on bytes.zone continues the series of articles from previous edition.
- Why I prefer trunk-based development
by Trisha Gee.
- This article references Perceived Barriers to Trunk Based Development by Dave Farley on his weblog (2018).
- You can find more about this workflow on Trunk Based Development site, first mentioned in Git Rev News Edition #24.
- Martin Fowler describes advantages and disadvantages of trunk-based development versus feature branches in Patterns for Managing Source Code Branches (biased towards better support for Continuous Integration), mentioned first in Git Rev News Edition #63.
- For the other side of this discussion, see for example Working with Feature Branches by Bruno Brito on Tower’s blog, mentioned in Git Rev News Edition #88.
- 5 Version-Control Systems that Game Developers Should Know About by Sharone Zitzman on The New Stack. Those 5 VCSs are: Git, Perforce, Plastic SCM (now Unity Version Control), SVN, and Diversion (cloud SCM, in beta).
- 9 best GitHub [Android App] alternatives in 2023 by Charnita Fance on Android Police.
- Git Merge – The Definitive Guide by Omer Rosenbaum on freeCodeCamp.
- Undo a [published] commit in Git by Angelos Chalaris on 30 Seconds of Code.
- Git Best Practices – A Guide to Version Control for Beginners by Adekola Olawale on freeCodeCamp.
- Common Git Issues and How to Troubleshoot Them by Abdelrahman Mohamed Allam on DEV.to, part of the Mastering Git: Essential and Advanced Commands for Developers series.
- The Power of Pre-Commit for Python Developers: Tips and Best Practices
by Deep Singh on DEV.to.
- Pre-Commit framework for managing and maintaining multiple multi-language pre-commit hooks (written as Python module) was first mentioned in Git Rev News Edition #45.
- Creating effective pull requests, by madhadron (Frederick J. Ross).
- How to Make your [Python] Code Shine with GitLab CI Pipelines by fernanda rodríguez on Medium.
- CI/CD with KiCad and GitLab by Stefan Schüller on his blog (on GitHub Pages).
- Code review at the speed of email
by Drew DeVault on Drew DeVault’s blog (2022).
- See also for example The advantages of an email-driven git workflow by Drew DeVault, mentioned in Git Rev News Edition #41.
- Connect to multiple Git accounts of the same vendor (GitHub, Gitlab) portal by Rahul Mahadik on DEV.to, originally guest-published at theonetechnologies.com.
- Database branching: three-way merge for schema changes:
Learn how PlanetScale uses Git-like three-way diff to resolve schema change conflicts across database branches.
Article by Shlomi Noach on PlanetScale Blog.
- You can find few tools to help version-control database schemas in Git Rev News Edition #60; you can also find there another article by Shlomi Noach.
- In Git Rev News Edition #82 you can find articles about different ways version control and databases connect, and tools that can version control database schema (perform database migrations), or version control queries, or version data within schema, etc.
- Version Control Your ML (Machine Learning) Model Deployment With Git using Modelbit by Avi Chawla, published in Towards Data Science, a Medium blog.
- GitOps - Operations by Pull Request (2017)
and What Is GitOps (2018)
by Alex on Weaveworks blog.
- Another article from Weaveworks about GitOps can be found in Git Rev News Edition #42 (2018).
- You can find more about GitOps / GitDevOps on GitOps.tech and OpenGitOps sites, first mentioned in Git Rev News Edition #62 and #94, respectively.
- 4 Core Principles of GitOps by Alex Williams, and GitOps as an Evolution of Kubernetes by Steven J. Vaughan-Nichols on The New Stack.
- Reproducible Data Dependencies for Python [with Quilt],
a guest post by Aneesh Karve published in Jupyter Blog (Medium-based blog).
- DagsHub, a web platform for storing, versioning and managing data (data hub), similar to Quilt Data mentioned in this blog post, was mentioned in various articles linked to in Git Rev News Edition #72, #85, #96, and tangentially in #97.
- See also links about data versioning in Git Rev News Edition #96.
- GitHub Copilot X CLI is your new GIT assistant
by Leonardo Montini for This is Learning, a part 3 in GitHub Copilot X (5 Part Series)
on DEV.to. Originally published at leonardomontini.dev.
- Similar article, GitHub Copilot for CLI makes Terminal scripting and Git as easy as asking a question can be found in Git Rev News Edition #98.
- BranchGPT: The AI-Powered Solution to Branch Names by Sebastian Tiedtke on Stateful.com blog (Tongue-in-cheek over the top take on AI).
Easy watching and listening
- For those who just don’t Git it (The Stack Overflow Podcast | Ep. 573) where Pierre-Étienne Meunier, creator and lead developer of open-source version control system Pijul (mentioned in Git Rev News Edition #9, #24 and #38), talks about version control, functional programming, and OCaml.
Git tools and sites
- Bytebase - database schema change and version control (the GitLab for Database DevOps): web-based collaboration workspace to help DBAs and developers manage the database development lifecycle.
- Quilt Data is a self-organizing data hub, consisting of a Python API, web catalog, and backend to manage data sets in AWS S3. The backend service is based on open-source Quilt Python package (documentation. The development of jupyterlab-quilt extension seems to be stalled, though.
- GitOps Principles v0.1.0 published by OpenGitOps.
- GIT Web Terminal (Git in your browser)
was created using isomorphic-git
Source code on GitHub: jcubic/git.
- isomorphic-git was first mentioned in Git Rev News Edition #40.
is a declarative, programmatic library in Vitess
that can produce a diff in SQL format of two entities:
tables, views, or full blown database schemas.
- Compare sqldiff.exe, which is a command-line utility program (Windows binary) that displays content differences between two SQLite databases. Mentioned in Git Rev News Edition #87.
This edition of Git Rev News was curated by Christian Couder <email@example.com>, Jakub Narębski <firstname.lastname@example.org>, Markus Jansen <email@example.com> and Kaartic Sivaraam <firstname.lastname@example.org> with help from Bruno Brito.