For the cast of thousands who hang off every word that I write on this site I am delighted to announce that I will be commencing a new role in the new year!
In January I will be starting work as a Senior Software Developer with Birdsnest, an online retailing company based in Cooma, NSW.
Whilst I will be based at my home office on the NSW Central Coast, I will be making regular visits to Cooma to work with staff on-site, especially the software development team.
Birdsnest is an inspiring company that offers a unique online shopping experience.
I look forward to working with the development team to help them build upon their successes to date.
The new year promises to bring new challenges.
Yes, having contributed to The Conversation’s development team as a contractor since June 2015, I will be moving on in late January, or earlier if I am able to provide fair notice.
I’ve very much enjoyed the opportunity to work with The Conversation over the last two and a half years and will have fond memories of collaborating with development colleagues as well as editors of the various editions around the globe.
It’s something that irks me.
My observation has been that developers are resistant to accepting the challenge of battling the scourge of intermittently failing tests.
What is the reason for this resistance? This is something that has intrigued me for years. And I’d like to delve into it in a little more depth.
I suspect that part of the human impulse to resist addressing non-deterministic tests is the intuition that to accept such a challenge might lead to much time and effort expended for little result. If that is the case, believe me, I understand. Fixing this sort of problem is not for those faint of heart!
Having experienced the struggle of identifying what causes an intermittent failure, I know it isn’t easy to unearth the root cause.
In my experience there is a palpable resistance to Martin Fowler’s suggested approach of quarantining non-deterministic tests. On one level, this bewilders me. As Martin illustrates, if a test cannot be relied upon to pass or fail, it is worse than useless. It is infecting the whole test suite. So, the sane approach is to, at least temporarily, remove it from the test suite.
Whenever I have suggested this, I have been met with resistence. Developers claim that the test is useful, that it provides protection as part of the regression suite against bugs being introduced. I’m still searching for a way of countering this argument, which is clearly fallacious. As Martin quite rightly asserts, if a test cannot be relied upon to pass or fail against the same codebase, it is worse than useless and must be immediately quarantined!
Sure, there is extra effort involved in configuring the quarantining process. The use of RSpec tags is handy for this. Then there is the perceived risk of the team forgetting to fix the quarantined test. Again, it requires some effort to set up, but it is certainly possible to automate warnings to the team about tests that have been quarantined for too long as well as build pipelines that contain too many quarantined tests.
Of course, another possible response to a troublesome non-deterministic test is to simply remove it from the suite.
This may sound radical, but let’s consider the situation from a cost/benefit perspective. If a test cannot be guaranteed to reliably pass or fail then it is clearly not providing much benefit. If it takes considerable effort to debug and still cannot be guaranteed to reliably pass or fail, what should the developers make of it? It has clearly already absorbed considerable cost. This leaves the question of potential benefit.
A related question is: how crucial is this test as part of the suite? Is it a vital part of the test suite? If the answer to this question is “yes” then it is appropriate to continue trying to solve the non-deterministic behaviour. However, if the answer is “no”, there seems little value in keeping the test within the suite.
People are naturally lazy. Why do something that requires effort unless you really have to or there is a clear benefit that will accrue to you?
If a developer is working on a pull request, pushes a commit and the resultant build on the CI server fails with an error that is obviously unrelated to their pull request, what’s the easiest thing to do? Click the button to rebuild the test, of course! The temptation to take this action, even if the developer in question has the appreciation that this may not be a helpful response in the even slightly longer term for his or her colleagues, can be compelling.
I’m not sure of the best way of countering this. Appealing to the greater good?
Usually when a developer notices an intermittently failing test their focus is elsewhere. They may be working on a feature and notice that the build fails due to a non-deterministic test that is unrelated to their feature. Or they may be watching the master build in preparation for a deployment. Understandably the priority in these scenarios is to enable the feature to be merged or the deployment to go ahead.
The key here is to at least take some action to fix the non-determinism, even if it is to schedule some work to rectify the situation later. Unfortunately, in my experience, the tendency is often to ignore the intermittently failing test.
As Martin Fowler points out, there are many causes of non-deterministic tests. Among them are:
Added to this, Keith Pitt has detailed 5 ways we’ve improved flakey test debugging, which focuses more on how to capture the database state when a test fails intermittently.
They are all worth reading. However, what I’m pondering in this article is more to do with motivation.
How can we best encourage developers to tackle non-deterministic tests?
Expecting the person who first notices the failing test to fix it is probably not a good approach. After all, that person is likely to be feeling frustrated or even angry that an unrelated test failure is holding up their progress.
A colleague of mine recently suggested assigning the task of fixing a non-deterministic test to the last person who changed that test. It’s a helpful suggestion that at least circumvents the frustration that I wrote about earlier. It also assumes that the team member assigned to fix the flaky test is willing and capable.
Of course, keeping a CI build healthy is a shared responsibility. The build is more likely to be healthy if all members of the team contribute to meeting the challenge posed by non-determinism.
In my case, one thing I need to be mindful of is to be careful not to let the frustration that I sometimes feel become counter-productive. As Kent Beck implied a while back, as well as working in small increments, it is important to be both kind and honest.
If I follow that advice hopefully I will respond to discovering non-deterministic tests by gradually finding ways to help the team to handle the challenge of fixing them more successfully.
One day last June I was moved to share an opinion on Twitter.
And as the following exchange shows, at least one reader, my former colleague SengMing Tan, expressed a desire to know more.
As I foreshadowed, it has taken me a while to get around to elaborating.
However, the time has come. After nearly a year has elapsed I am fulfilling my promise.
Here is a description of how the team I’ve been working in at The Conversation uses Kanban, Trello, GitHub, Buildkite and Babushka to develop, review and ship software in a way that encourages flow.
Whilst there are several tools that we use which combine to help our team feel that we are constantly making progress towards shipping software, it is the Kanban system which underpins them.
As I understand it, the team started out with a low-fi approach by using a physical card Kanban wall. But by the time I joined, they had moved that wall to Trello. Which was just as well, because I was the first remote member of the development team.
Nevertheless, I think it is the Kanban style of only allowing a specified maximum number of cards in each “swim lane”, that is crucial to the overall feeling of progress that our team has.
As anyone who has used Trello knows, the general style is to represent progress by aiming to move cards from left to right across the columns. Whilst there are other columns on our development board, the following image depicts those that are essential to the way Trello supports our Kanban style of development.
I’ve blurred out the details of the cards but the key things to notice are the column headings. Notice that we have upper limits set for how many cards should be Queued, In Progress or under Review. This helps us each individually focus on completing a piece of work. As a team, it draws attention to ensuring that work is reviewed in a timely manner. If more than six cards are in the Review column, we consider that to be a broken state. It is a prompt to the team to give more focus to reviewing pull requests until we can merge enough of them and move the corresponding Trello cards to the Ready column.
There are other columns on our development board such as Confirmed to the left and Complete to the right of those shown in the screenshot. And we have other boards. However, in the context of how we use Trello to help the team achieve flow, the four columns shown demonstrate what is at the heart of how we use Trello.
One of the things I like about the paid version of Trello is the various Power-Ups. I find the GitHub Power-Up particularly useful. As I have written elsewhere, when I’m working towards a solution I prefer to share code via a pull request as early as possible. Fortunately our team has a strong culture of providing respectful feedback via pull requests.
There are times when I feel the need to gently prod my teammates to provide feedback on my pull requests. However, once the conversation within the context of a GitHub PR starts, it is usually very helpful. I like the way the tone of our comments tends to be questioning and curious rather than judgemental.
Once a Trello card, with a linked PR, is designated as available for review, it is important for the team to give it timely attention. Occasionally attention is diverted elsewhere. For this reason our team programmed our Slackbot to inform us if a card has been in the Review column for too long. To me this is a helpful prompt to keep contributing to the effort that will result in shipping software. Speaking of Slack, its integration with GitHub is an obvious boon. Being able to see via our main #dev Slack channel when a PR is created or merged certainly helps teamwork.
Being able to easily trace code changes in a PR that result from a Trello card is wonderful for maintaining flow. Did I mention how much I love the Trello GitHub Power-Up?
The ways in which Buildkite can assist teams achieve flow is worthy of a post in itself. For the purposes of this discussion, I’ll confine myself to the way Buildkite is integrated into our team workflow.
We use Buildkite to automate our continuous integration builds. Unsurprisingly it is integrated with GitHub so that it’s easy to see whether or not the build has passed for the latest commit on a PR. Then there is the Slack integration, which I find useful as another prompt about the success or failure of builds. There are times when the first place I’ll notice a build failure is via our #buildkite Slack channel.
Obviously part of achieving flow is ensuring that the build for a PR is successful before that PR is merged. And, of course, the build for the master branch must be before we can deploy.
Whilst we have not yet reached the point where we continuously deploy our software, we do typically deploy applications several times each day. The tool that helps us software is Babushka, courtesy of an alumnus of The Conversation, Ben Hoskings.
Babushka may not be as well known as other tools which support deployment but it has served our team well so far. Once a master build is for an application and we are ready to , it is a simple matter of entering babushka ‘SHIP IT’ at the command line and the defined dependencies will enable babushka to deploy our software.
It’s all about flow. As emphasised by psychologist Mihály Csíkszentmihályi, in a personal context flow is “the mental state of operation in which a person performing an activity is fully immersed in a feeling of energized focus, full involvement, and enjoyment in the process of the activity.”
Translating this concept into a software team, we can see that it is important to remove as many barriers as we can to the team being fully immersed in shipping quality software. The Kanban approach supported by various integrated tools can certainly help in this regard.
And, in my experience, it can be fun too!
Fast forward to the present and I find myself reflecting on Ruby Conf AU, held on the Gold Coast recently.
My gut feeling is that the conference would score fairly well if evaluated against Kent’s criteria. But us programmers are analytic beings, so let me reflect on the talks and see how well they stack up.
Whilst there were many excellent talks amongst the 24 that were included, I’m going to pick some of them that resonated with me and place them each in one of Kent’s categories.
Gradual refactoring, as illustrated by Katrina Owen in her One Undo talk, is a vital aspect of working in small increments. Katrina showed us an example of how to initially place some poorly optimised code under test and then progressively improve it by teasing out the abstractions. To me, it was a brilliant example of how to “remove duplication and improve names in small cycles”, as Joe Rainsberger would say.
To work in small increments, refactoring as you go, requires effective testing strategies. To this end, Tom Ridge gave a thoughtful talk which focussed on the readability of RSpec code. In Explicit Tests Tell a Better Tale, he challenged us to consider how our choices in our RSpec usage affect our cognitive load.
I could have categorised the talk by Elle Meredith as honest, which undoubtedly it was. However, my first inclination about Elle’s talk entitled Feedback Matters was that it was foremost about being kind. As Elle’s talk emphasised, giving and receiving feedback in the right spirit with careful attention to how it affects people is of utmost importance to software development teams.
Ernie Miller struck a chord with his Humane Development – Empathy talk. I’m looking forward to seeing his slides and watching the video to see what else I can glean from what Ernie had to say. But I do recall that I found myself thinking, how good is it to hear someone talking about empathy in the context of software development!
Adam Cuppy may have posed the question, What If Shakespeare Wrote Ruby? and arguably provoked thoughts about abstract commonalities between Shakespearian and Ruby patterns. However, to me there were overriding factors that lead me to unquestionably consider his talk as kind. For one thing, this professionally trained actor provided the opportunity for the organisers to schedule his talk as the second last in the conference. A masterstroke! Secondly, Adam describes himself as a Master of Smile Generation. I rest my case.
Jeff Casimir opened the conference with a talk that unquestionably fitted the “honest” bill. Sharing his experiences in the context of 10 Years and 10 Mistakes set a nice tone for the conference. I found it to be a refreshing approach. We all learn from our mistakes but it takes a certain degree of intestinal fortitude to get up on stage and honestly talk about all the ways you have goofed up.
Debugging Diversity, presented by Dan Draper and Catherine Jones was without a doubt an honest appraisal of the challenges that face the Australian tech community with respect to increasing the opportunities for people who don’t fall into the “white male” stereotype that is predominant.
Given the seriousness of the diversity challenge in tech, it was pleasing to also listen to Jess Rudder give her perspective on the topic. Her presentation, Diversity in Tech – It’s About More than Just the Hiring Process, hit home, focussing on a critically important aspect. The fact that so many women choose to leave the IT industry points to a problem that the community needs to address.
We were privileged to hear Senator Scott Ludlam present the closing keynote of the conference. Honesty is a word that leaps out when I reflect on Scott’s talk, How the government broke the internet. The importance of an honest approach to the Internet and democracy, that is. As Scott illustrated, we certainly cannot assume government honesty when it comes to privacy for individuals and transparency of governments.
I guess it’s fair to say that Paulo Perrotta presented about “details”. After all, his talk delved into Refinements, a Ruby 2 feature. However, when you consider that Paulo’s talk was entitled Refinements – the Worst Feature You Ever Loved, you get an idea that this Italian has a devious sense of humour. And so it proved. To me, this was a great example of the importance of presenting a technical talk as an entertaining story.
When I reflected on what André Arko had to say, I admit that I hesitated about where to place his talk, or talks. His official talk was entitled Security Is Hard, But We Can’t Go Shopping, in which he shared with the audience the importance of handling security vulnerabilities. I guess it’s fair to say that’s an explanation of details. However, that wasn’t the end of his message. André went on to talk about Ruby Together, which to me sits squarely in the honesty category.
Of course, the talks were only part of the conference. I love the way RubyConf AU has evolved to feature plenty of social activities. My impression is that our international guests particularly appreciate these, especially those on the Saturday morning. Kudos to the conference organisers, Jo Cranford, Rob Jacoby and Trish Jacoby, together with the volunteers for their thoughtfulness and kindness.
I think my selection of talks bears out my contention that the success of Ruby Conf AU 2016 coincides with the fact that many of the talks fitted in with Kent’s idea that a good conference focusses on small increments, kindness and honesty.