Learning from the mistakes our customers care about

5 question marks
5 whys

As mentioned in a previous post I keep a close watch on customer defects. These are the issues that a customer cared about enough (or was sufficiently annoyed by) to contact us and tell us about them.
I am focusing on the issues here, not the feature requests or the how do I’s, though both can be also regarded as defects in ‘failing to understand or predict the customers needs’ or ‘failing to deliver an intuitive product’ respectively.

Being a big fan of prevention is better than cure, I like to investigate the customer issues and perform a root cause analysis or 5 whys on the reasons each issue escaped our attention.
Yes, I refer to customer reported issues as escaped defects, since they escaped our detection. It doesn’t matter how many stages in your pipeline or how may automated tests at different levels, or even how good the teams are, there will be some issues that escape our attention.
Technically, I also regard issues discovered late in our pipeline, after story acceptance and as part of our release process, as escapes too, as well as any issues we happen to find in production (before a customer reports them).

There are lots of quotes around learning from failure, and being doomed to repeat your mistakes if you fail to do so. I believe, along with many others, that true learning only comes from failure and understanding the reasons for it. However, we should take care to not make the same mistake twice as this indicates a failure to learn. So the reason for analysing these escaped defects is not to apportion blame or point fingers, it is instead to learn how we can prevent a similar class of issue in future. The preference here being to prevent the class of issue from ever being coded again. If that proves to be more expensive than the cost of impact of the issues then at least being able to prevent the class of issue escaping our attention again.

I was introduced to Lean software development techniques via Mary and Tom Poppendick  which led me to learn more about The Toyota Way where I learnt the 5 whys technique. Prior to this I had been using other root causing techniques or simply using my QA ability to ask difficult but relevant questions to achieve the learning and expose the actions.
The 5 whys technique is just so simple that it makes it easy for anyone to participate as well as facilitate, meaning that anyone can do this – you don’t need to be a QA or have a background in problem solving or root cause analysis techniques.

So, what does it look like? Well here is an example with some edits to remove any proprietary details (note 5 is simply a guide, you can use more or less whys);

Problem statement: Service proxy was updated in C# provider code but not in consumer code

Why didn’t we catch this?

  • tests run in the consumer pipeline were not sufficient to expose the issue
  • tests were not full contract tests – nothing testing the contract between producer and consumer
  • no communication between the producer and consumer teams on any changes made to the interface

Why didn’t the consumer pipeline tests expose the issue?

  • because the test only exercised the simplest possible scenario which did not get affected by the change (sec call returned minimum possible data)

Why were the contract tests insufficient?

  • contract testing is not very well understood by all teams concerned
  • tests were not reviewed by anyone except developers

Why wasn’t there communication between teams?

  • the producer of the service does not know who is consuming that service
  • tests didn’t relay information of endpoint changes
  • consumer tests were still passing (green)

Why didn’t the tests relay information of endpoint changes?

  • there were no tests asserting or checking the stability of the interface
  • the consumer was coded to de-serialise the entire response when really it only needed to check for ‘success’

Why was the consumer coded to de-serialise the entire response rather than just parse the value of interest?

  • because it was deemed easier to use a standard pattern to de-serialise entire response rather than write code to specifically look for just the value of interest

Some example actions that were taken as a result of this;

  1. Provide training in contract testing patterns
  2. Producer to add tests to notify producer team of interface change (trigger for investigation or communication of change)
  3. Consumer to provide contract tests for producer to run in producer pipeline to alert of breaking changes for consumer
  4. Audit all cross team interfaces/dependencies to negotiate and add any missing contract tests

Author: Stuart Ashman

I am currently working as the Director of QA at Vision Critical a market research software and services company. I have been working in a variety of roles involving testing and quality assurance for over 20 years. I started off testing flight deck instruments and progressed through GSM network operations software, Unix Operating Systems and Lights Out Management Firmware, into Anti Virus and Anti-Spam software and HW appliances, finally spending a short period of time testing cloud provisioning and control software before entering into my current position.