DevOps Log Analytics Log Insight Web/Tech

Automated Tests: Part 2, Regaining Stability

In part 1 of this mini-series I talked about the importance of automated test stability. However what do you do if you already have unstable tests and they hinder your continuous integration efforts?

Automated Tests: How to Bring Back Stability

The first and most important step towards fixing unstable tests is to realize that this is a real problem that will likely only get worse over time. You need to communicate and explain this problem to your team. You will need everyone to be on-board for the subsequent steps. Do not try this if you cannot convince your team, this effort will then likely fail.

Failing Tests are Bugs

Once you have the support of your team you can start rolling out the solution, which is based on a simple principle: a failing test is a bug and should be treated just like any other bug in your code base. Why is this so? Simple: when a test fails it may be because the test is badly written but it may be also failing because it triggers a real bug in your product. You cannot know this a priori, the root cause of a failing test can be only found after analyzing each particular failure.

If we accept this simple fact then we need to also accept that an intermittently failing test may also be caused by an intermittently occurring product bug. The logical conclusion is: test failures should be treated as bugs.

Now, what do we normally do when we encounter a bug? We create a new bug description in whatever bug tracking system we use. That is exactly what we need to do if a test fails: file a new bug in our bug tracker. The bug should contain the details of the failure such as the name of the test that failed, execution logs, link to the failing job in your continuous integration system and so on. The bug then should be assigned to some person on your team. You have two choices here: either assign it to the committer – the person that made the last change to the master branch or to the person who is the author of the failing test. The former assumes that whoever made the change actually broke the product and the later assumes that the committer is innocent and test author is the culprit. How do you choose this? If you know that you have unstable tests (and I assume you do since you read this) then I advise to assign to test author. You can also CC the committer to the bug description. Make sure to attach all relevant information to the bug description, most importantly test execution logs that can be used for post-mortem analysis of the failure. If your tests do not write detailed logs with precise timestamps it is time to start creating them, they are invaluable for diagnosis.

Automated Bug Creation

If you are like me you will soon decide that creating all those bugs about failing tests manually is a waste of time. The next step is to write a script that does this for you automatically.

The script should run anytime a test job in your continuous integration system fails. It should scan failures and perform all the steps described above but do it automatically, without human intervention. Note that for this to work you need to have all the information for creating the bug available to your script. One piece of information that is often missing is the author of the test. You will need this recorded somewhere. One possible approach is to have a file that maps test names to author names. A better approach is to embed this information in the test files as annotations or specially formatted comments. In our Python tests we do this:

# @author tnajaryan
def test_tc15371_restart_on_upgrade(self):
   # test code starts here

Our script then scans relevant Python files, finds the function definition that matches the failing test name, then extracts the preceding @author tag.

Another question is what should be the priority of the newly created bug. You can leave it unassigned and assigned it manually but we found that automation helps here too. I suggest that if the test fails the first time give it some non-top priority value, such as P3. If the same test fails again increase the priority and repeat it for subsequent failures. This simply follows from the logic that a more frequently failing test is more important to fix to gain the stability.

There is a special case that you will probably want to address separately: the infrastructure related failures. When something goes wrong in your continuous integration infrastructure often a large number of tests fails. It is best if your automation script detects this situation and instead of creating a large number of individual bugs creates just one bug which lists all failed tests. This bug is then a good action item for fixing the infrastructure problem.

If you implement this automated bug filing correctly and your team is on-board (as you see I cannot emphasize this enough) then the magic should start happening. Failing tests should result in new bug creation, then the developers will gradually fix all the failing tests. The number of unstable tests will go down and eventually all of the tests will become stable. As an added bonus you may discover that some of the intermittent failures were real product bugs and will fix these bugs along the journey.

The Results

Here is what your test result trends probably look like if you do not pay much attention to test stability:

Unstable Tests Tremd

Notice the dozens of intermittently failing tests (red in the chart). The total number of tests is not increasing, this team is probably already giving up on automated tests, they no longer believe in them.

And here is what it should look like if you fixed the instability:

Stable Tests Trend

There are zero failing tests and the total number of tests increases over time. This team believes in automated testing and knows how to practice it. When you get to this point you will probably want to change the policies a bit. Since you are now confident that all tests are stable the failures are now more likely to be a result of latest committed change (which is what classic continuous integration school says). In this case it makes more sense to assign bugs triggered from failing test to last committer person rather than test author (I would still CC the author anyway).

Hopefully this simple advice will help you fix unstable tests and keep them stable and allow you to reap the benefits of continuous integration and automated testing. I would love to hear your feedback.