Identifying breaking commits
Overview
Teaching: 30 min
Exercises: 0 minQuestions
How can I use git to track down problems in code?
Objectives
Learn to identify when and in what commit problems were introduced
Episode setup
First we need to pull down some code from a remote repository, let’s change to our Desktop
$ cd ~/git-demystified
and clone the code
$ git clone git@github.com:sa2c/example-hello-world.git
and change into the fresh repository
$ cd example-hello-world
Let’s take a look at the contents of this repository
$ ls
We see a small number of files; let’s have a look inside hello.sh
.
$ nano hello.sh
Since this is an example, most of the file does nothing. Only one line does any work, and it contains an error. Let’s try to run the code
$ ./hello.sh
This clearly has a problem, as expect. Let’s look at the log history to see if we can spot it.
$ git log --oneline
If we looked at this for a while, can could probably spot the commit that might be causing the issue, the commit labelled “Changed echo to echom”. In reality however, finding the problem wouldn’t be this simple. In general, we might not know what file the problem is in, or where in that file. We may have hundreds of files with hundreds of lines each, and no idea where to start looking. Let’s start by looking at the inital commit
$ git checkout 1153
And see if the hello.sh
script runs here.
$ ./hello.sh
That’s good news. The file runs with no problems in the initial commit, somewhere between the two commits something went wrong. In this section, we will explore ways in which we can investigate the sources of errors. Let’s move back to the tip of the master branch.
$ git checkout master
git blame
If we know where the problem is in the file, we might ask ourselves what introduced this problem. What commit introduced this line. Let’s try this with
$ git blame hello.sh
We see that most lines were created in the same commit, but some were modified in other commits. There are a lot of lines here, let’s focus on the range 30 to 50
$ git blame -L 30,50 hello.sh
That’s better. Let’s take a closer look at the commit on line 45.
$ git show da86
That’s interesting. We can see that the problematic line was in fact
copied from another file at this commit. This can make git blame
a
little less useful, we would like to know the commit in which this set
of lines originally appeared in any file. Fortunately, we can ask git
blame
to attempt to track movement between files
$ git blame -C -L 30,50 hello.sh
git blame
is a very useful tool if you know the line that causes the
issues in the first place, but you want to look at the commit message
of that generated the line to check where it came from. Now we can see
the lines were actually introduced in another commit, let’s take a
look at that commit now
$ git show 8f67
Binary searching with git
We could checkout each commit one at a time, and check each one, but this is very time consuming. We’d have to check out each commit one at a time, like this
$ git checkout HEAD~7
$ git checkout HEAD~6
$ git checkout HEAD~5
...
$ git checkout HEAD~3
$ git checkout HEAD~2
$ git checkout HEAD~1
We can do better than this if we choose a half way point between the
bad and good commit, check if that is good or bad, and keep choosing a
half way point until we find the commit that causes the code to go
from good to bad. Git can actually help us do this with the git
bisect
command. Let’s try it
$ git bisect start
We mark the current commit as bad
$ git bisect bad HEAD
Then we can mark the initial commit as good
$ git bisect good 1153
Git will now drop us at a commit half way between the good and the bad commits, we can verify this with
$ git log --oneline master
We see some commits marked as bad and good, and git has placed us in the middle commit. Now we can test this commit
$ ./hello.sh
It works! The code wasn’t broken at this point. Let’s mark this commit as good
$ git bisect good
Great, git has moved us again. Let’s check where we are this time
$ git log --oneline master
The markers for good and bad have moved, because we’ve given bisect more information, and HEAD
has been placed between them. We know this is the first bad commit, but git doesn’t know that yet. Let’s test it
$ ./hello.sh
This failed, as expected. Let’s mark this as a bad commit
$ git bisect bad
That’s odd. We found the bad commit, but git kept looking. Let’s take a look
$ git log --oneline master
Git has marked the good and bad commits, but it doesn’t know yet if the previous commit might have been the first bad one. It needs us to check that. Let’s go ahead and do that
$ ./hello.sh
This is a good commit, let’s mark it
$ git bisect good
Finally, git has found the commit we were looking for and told us where it is. Let’s see where we are
$ git log --oneline master
Git has marked the relevant commits as bad, but it hasn’t moved us to the first bad commit. It left us in this pending state. Let’s take a look at the content of the breaking commit
$ git show da86
Git is telling us that the problem was introduced by a change that
happened on line 39 of hello.sh
where echo
was changed to
echom
. For us, this was probably a problem that is easy enough to
resolve without using bisect, but for a large complex code base when
we don’t know where to start, bisect can instantly point us to the
change which first caused the problem. Let’s exit the bisect state and
go back to master with
$ git bisect reset
This worked great, and we can go through large numbers of commits with
this technique, but there was a lot of typing. Can git do a better
job? It turns out that it can. Let’s look at the return value of the
hello.sh
script
$ ./hello.sh
$ echo $?
The variable $?
is a special variable containing the return value of
the function. In this case it is non-zero, indicating an error. Let’s
look at the historic commit
$ git log --oneline
$ git checkout 1153
And test the code
$ ./hello.sh
$ echo $?
In this case the script returns 0, indicating success. This is a common convention in Unix scripts, and you can write your own scripts that follow this convention. Git can use this convention to decide if a commit is good or bad. Let’s try it
$ git bisect start HEAD 1153
Once again, git drops us in the middle of a commit. This time, instead
of running hello.sh
, we tell git to run it for us
$ git bisect run './hello.sh'
Git does all the boring work for us. Every time it runs the command we gave and gets a zero return value, it marks the commit as good, every time it sees a non-zero value, it marks the commit as bad. It then tells us the first commit if finds which changes the state of the repository from “good” to “bad”. Now that we’re done, we exit again with
$ git bisect reset
One caveat
This is a very powerful debugging tool, but it relies on all your code being in a runnable state, such that git can automatically identify when this state changes. It works best when used with a branching and merging strategy, to ensure there are no breaking commits on the master branch.
Key Points
Learnt to use git blame to identify when a problem line was introduced
Learnt to use binary searches to identify lines which first introduce a problem