Git Weekly 2 - magic mirror on the wall

| 3 min read

📋 Summary

As the cleanup of the git repository wasn’t complete, I had to check the changes I was making, understand the differences between branches or check that the rebasing of a branch gave the same result despite many conflicts.

How to display difference statistics between 2 commits? #

When I want to compare 2 branches or 2 commits, I always start by displaying the difference statistics between 2 commits. This allows me to measure the delta between these 2 versions of the source code.

The first command returns the number of files modified and the number of changes in these files:

$ git diff --shortstat mybranch...mybranch-rebased-master
 2 files changed, 45 insertions(+)

The second command provides the same information, but for each modified file:

$ git diff --stat mybranch...mybranch-rebased-master                                          |  7 +
 src/main.c (new)                                   | 38 +++++++++++

How to check that 2 commits contain the same source code? #

You can use the previous commands to check that the contents (and not necessarily the history) of 2 branches are the same. But if there are no changes, the command won’t return anything. This may leave some doubt…

To be sure that the contents of 2 branches or 2 commits are strictly identical, I compare the hash of the root of the tree of these commits. No matter what the history, the author, the machine or when the commit was made, the hash of the root of the tree representing the content of a project under git will always be the same. If you’re wondering how this is possible, I invite you to watch the talk on git internals[1].

Here’s how to do it:

# retrieve the hash of the last commit of the first branch
$ hash_branch1 = git rev-parse branch1

# display the hash of the root of this commit
$ git cat-file $hash_branch1 -p | grep tree

# retrieve the hash of the last commit of the second branch
$ hash_branch2 = git rev-parse branch2

# display the hash of the root of this commit
$ git cat-file $hash_branch1 -p | grep tree

If the root hashes of the last commits of the 2 branches have the same value, then the 2 branches have the same content.

Why does the branch comparison in Gitlab or Github show too many differences? #

In order to check the differences between branches, or to verify the changes made, you may want to use an UI such as Gitlab or Github. However, the result is not necessarily what you might expect. In fact, by comparing 2 branches that you think are identical, these tools may return a huge number of changes. But why these differences? To explain this, we need to go back to the basics of git.

There are 3 ways to compare commits in git[2]:

  • <commit1> <commit2>: comparison of 2 arbitrary commits in any tree
  • <commit1>..<commit2>: same comparison as above
  • <commit1>...<commit2>: comparison between the common ancestor of the 2 commits and the first commit (<commit1>)

Gitlab and Github use the 3-dot notation (...) by default. This makes it possible to list all changes since the commit from which the branch was checked out (and not to know the difference between branches!). Which seems logical when you consider that these platforms are Merge Request and Pull Request oriented. Since the reviewer will want to check how a pulled branch has evolved over time before accepting its merge.

This corresponds to the “Only incoming changes from source” option in Gitlab. If you really want to compare branches, use the “Include changes to target since source was created” option (see the documentation).

Want to learn more about git? Check out the Git Weekly series!

  1. That I gave in French (sorry) at Devoxx France. ↩︎

  2. We can apply these notations to branches by replacing “commit” with “last branch commit”. More details in the git-diff documentation. ↩︎