Test pollution manifests itself as seemingly false negatives or false positives in a test suite. It occurs when some shared state is unintentionally modified, or unintentionally read and used in a test.
When test pollution builds up, it can mean that a project’s build fails unpredictably, which can stop a whole team from shipping code regularly. This is an expensive way to not build software.
Here’s an example of test pollution. You can save and run it with Ruby if you like. You shouldn’t need anything but a recent version of Ruby. If you run it several times, it will sometimes fail and sometimes pass:
[gist id=5449597 file=test_pollution.rb]
Why is it so unreliable? MiniTest orders tests randomly by default. When the first test runs before the second test, the test case fails, because the first test has the effect of setting the @logged_in instance variable to true, and the second test is effectively expecting the value of @logged_in to be false. The code under test has global state: the class instance variable, @logged_in.
The problem with the first test is that it’s a bad citizen: it sets global state and doesn’t clean itself up. The problem with the second test is that it’s presumptuous: it relies on global state being something in particular. As an aside: the code is terrible, and the flakiness of these tests should prompt you to change it, but I used a contrived example for the purpose of demonstration.
Fighting pollution with existing tools
I mentioned that MiniTest orders tests randomly by default. This is a Good Thing: it’s a deliberate ploy to flush out test pollution. If you ran the above code and it failed, it would give you a ‘seed’ number to pass in to the test, so that you could consistently run the test in the failing order. From there, you could hopefully work out why the particular order of tests failed. Both RSpec and MiniTest allow you to run tests in a random order, and both allow you to re-use the order from a previous failed run.
These aspects of MiniTest and RSpec are useful. They allow you to fix the order, so that you can find the dirty polluters of your test suite. This is often a quest to find two items: a polluter and a polluted test. Often you’ll find more than two, or a strange combination of tests that, when run in a particular order, cause a failed build.
On larger projects, these tools aren’t enough. Large codebases tend to have correspondingly large, slow test suites. Finding a source of test pollution in such beasts can involve looking at a lot of code, and can take a very long time.
Why can’t we automate this?
We’ve seen above that tests are reorderable chunks of code. So, you’d think that digging through reams of code just to reduce a pass/fail situation down to two examples would be an automatable process. You’d be right, but it’s not as straightforward as it ought to be.
When my pair and I first set out to write a tool to reduce tests down to their polluting / polluted components we thought we could get RSpec to output a list of files. We could then just feed the list of files back into RSpec in the order in which they failed, and use a binary search to find the offending files.
Unfortunately, it’s not that easy. RSpec isn’t file-centric, but groups-and-examples-centric. For the uninitiated, an RSpec test is composed of groups and examples:
[gist id=5449597 file=rspec.rb]
Once test files are processed, they’re loaded into memory and randomized regardless of file (but keeping group hierarchy intact). RSpec’s randomization algorithm is pretty simplistic:
- Programmer: Hey RSpec, run the suite with this seed: 123
- RSpec: I found a set of sibling groups. What should I do?
- Randomizer: ‘Randomize’ them according to the number of items in the set, with this seed: 123
- RSpec: OK, I’ve got a set of sibling examples inside one of the groups. What should I do?
- Randomizer: ‘Randomize’ them according to the number of items in the set, with this seed: 123
And so on. This approach is great so long as you don’t want to reduce the problem set. If you do reduce the set of examples that RSpec is running, the order is lost, because the number of items in the list changes.
Enter The Scrubber
I’d ideally like to tell a computer to go and find my test pollution. I’m some of the way there. So far, I’ve managed to create a semi-automatic solution: get RSpec to output the order of a test run to a human-readable file, so that it can be edited and fed back into RSpec to order the next run.
Scrubber is a project I started last week that allows you to persist RSpec run orders, edit them, and replay them. It relies on a relatively new feature of RSpec that allows you to define custom ordering strategies. These strategies are just blocks of code that take as an input the list of groups or examples in a particular section of your suite, and return the groups or examples you want, in the order you want them.
The main stumbling block when writing the utility has been deriving a unique ID from each example or example group. The ID needs to be human readable and also reproducible across runs. So far I have a simplistic solution that just dumps the group or example description, and file location. This isn’t very unique, especially considering that RSpec has no restriction on having groups or examples with duplicate descriptions.
In future, I hope to use a more robust ID, perhaps a checksum of the example/group’s metadata. Again, this isn’t straightforward as some of the metadata are Proc objects. Perhaps the RSpec team would be interested in persistable suite runs as a core feature?
Anyway, here’s an example of Scrubber in use. If you’re interested, I suggest you clone the repo and have a play with the example to see how editing a file might work. It’s very rough around the edges right now, but serves as a proof of concept. Hopefully I’ll get enough time, or contributors, to make this into something more automated and user-friendly.