It’s something that irks me.
My observation has been that developers are resistant to accepting the challenge of battling the scourge of intermittently failing tests.
What is the reason for this resistance? This is something that has intrigued me for years. And I’d like to delve into it in a little more depth.
I suspect that part of the human impulse to resist addressing non-deterministic tests is the intuition that to accept such a challenge might lead to much time and effort expended for little result. If that is the case, believe me, I understand. Fixing this sort of problem is not for those faint of heart!
Having experienced the struggle of identifying what causes an intermittent failure, I know it isn’t easy to unearth the root cause.
In my experience there is a palpable resistance to Martin Fowler’s suggested approach of quarantining non-deterministic tests. On one level, this bewilders me. As Martin illustrates, if a test cannot be relied upon to pass or fail, it is worse than useless. It is infecting the whole test suite. So, the sane approach is to, at least temporarily, remove it from the test suite.
Whenever I have suggested this, I have been met with resistence. Developers claim that the test is useful, that it provides protection as part of the regression suite against bugs being introduced. I’m still searching for a way of countering this argument, which is clearly fallacious. As Martin quite rightly asserts, if a test cannot be relied upon to pass or fail against the same codebase, it is worse than useless and must be immediately quarantined!
Sure, there is extra effort involved in configuring the quarantining process. The use of RSpec tags is handy for this. Then there is the perceived risk of the team forgetting to fix the quarantined test. Again, it requires some effort to set up, but it is certainly possible to automate warnings to the team about tests that have been quarantined for too long as well as build pipelines that contain too many quarantined tests.
Of course, another possible response to a troublesome non-deterministic test is to simply remove it from the suite.
This may sound radical, but let’s consider the situation from a cost/benefit perspective. If a test cannot be guaranteed to reliably pass or fail then it is clearly not providing much benefit. If it takes considerable effort to debug and still cannot be guaranteed to reliably pass or fail, what should the developers make of it? It has clearly already absorbed considerable cost. This leaves the question of potential benefit.
A related question is: how crucial is this test as part of the suite? Is it a vital part of the test suite? If the answer to this question is “yes” then it is appropriate to continue trying to solve the non-deterministic behaviour. However, if the answer is “no”, there seems little value in keeping the test within the suite.
People are naturally lazy. Why do something that requires effort unless you really have to or there is a clear benefit that will accrue to you?
If a developer is working on a pull request, pushes a commit and the resultant build on the CI server fails with an error that is obviously unrelated to their pull request, what’s the easiest thing to do? Click the button to rebuild the test, of course! The temptation to take this action, even if the developer in question has the appreciation that this may not be a helpful response in the even slightly longer term for his or her colleagues, can be compelling.
I’m not sure of the best way of countering this. Appealing to the greater good?
Usually when a developer notices an intermittently failing test their focus is elsewhere. They may be working on a feature and notice that the build fails due to a non-deterministic test that is unrelated to their feature. Or they may be watching the master build in preparation for a deployment. Understandably the priority in these scenarios is to enable the feature to be merged or the deployment to go ahead.
The key here is to at least take some action to fix the non-determinism, even if it is to schedule some work to rectify the situation later. Unfortunately, in my experience, the tendency is often to ignore the intermittently failing test.
As Martin Fowler points out, there are many causes of non-deterministic tests. Among them are:
Added to this, Keith Pitt has detailed 5 ways we’ve improved flakey test debugging, which focuses more on how to capture the database state when a test fails intermittently.
They are all worth reading. However, what I’m pondering in this article is more to do with motivation.
How can we best encourage developers to tackle non-deterministic tests?
Expecting the person who first notices the failing test to fix it is probably not a good approach. After all, that person is likely to be feeling frustrated or even angry that an unrelated test failure is holding up their progress.
A colleague of mine recently suggested assigning the task of fixing a non-deterministic test to the last person who changed that test. It’s a helpful suggestion that at least circumvents the frustration that I wrote about earlier. It also assumes that the team member assigned to fix the flaky test is willing and capable.
Of course, keeping a CI build healthy is a shared responsibility. The build is more likely to be healthy if all members of the team contribute to meeting the challenge posed by non-determinism.
In my case, one thing I need to be mindful of is to be careful not to let the frustration that I sometimes feel become counter-productive. As Kent Beck implied a while back, as well as working in small increments, it is important to be both kind and honest.
If I follow that advice hopefully I will respond to discovering non-deterministic tests by gradually finding ways to help the team to handle the challenge of fixing them more successfully.
© 2023 Keith Pitty, all rights reserved.