November 2020 – QA Matters

Be careful what expectations your team hears from you

I will often say that “you will get what you manage to” I think this is my version of Peter Drucker’s “what gets measured gets managed,” and since his quote is somewhat contentious, I will explain what I mean by mine;
In my experience, people and teams will work to produce the results that they are managed to. In other words, if your manager/leadership focuses on the number of lines of code you write (i sincerely hope not!), then your team will focus on producing more lines of code. If your manager/leadership focuses on meeting deadlines, your team will work to meet the deadlines.

We all have deadlines, so this may not seem like a bad thing. But it can be if you are only focused on making the deadline, as this often leads to making choices or decisions that will help meet the deadline but will compromise other things like quality, maintainability, performance, etc.

A warning for managers and leaders, be very careful what you choose to manage your teams to and be aware of what your teams perceive you care about most. What do your people hear from you most frequently? If all they understand you talk about or worry about is deadlines, then this is what they will believe you care about most and will try to ‘solve that problem for you,’ sometimes by cutting corners or making compromises to meet it.

I have heard so many managers and leaders say that quality is the most important thing. It is priority number 1, only to have those very same managers and leaders continually asking the team when we will be done and why a feature is not ready yet. They do not ask how well we developed the feature or how we assessed the quality.

People and teams will also develop habits and become habituated to working a certain way and making individual decisions; these habits and working patterns can then be hard to change.

An analogy for this might be how you formed certain habits based on how your parents ‘managed’ you as children. For example, if your parents insisted you clean and put away all the dishes and cookware after a family meal and make sure the kitchen and dining room were neat and tidy. Then, you have probably formed that habit and still do that today as an adult. And you may be uncomfortable if you are in a situation where dishes are left in the sink or on the dining table, as this does not feel ‘right’ to you.

Translating this into the workplace; You will probably be uncomfortable with technical debt. You may prefer to clean up after you have finished some work. Perhaps by adding tests, making sure the build pipeline is green after you merge your code. You may also update the operations docs and playbooks to ensure anyone on-call knows how to spot issues in the new code you delivered – making sure that there is no metaphorical food left rotting and generating smells.

However, if it was normal for you to leave dirty dishes on the table or in the sink, or if you formed this habit once you were no longer being managed to clean up after a meal. Then you will be used to not tidying up after yourself, or others make a mess and will be more comfortable with things being untidy, undone, not clean, etc.

Translating this into the workplace; You will probably be comfortable with technical debt. You will prefer to move onto the next fun task after you have finished some work, perhaps signing off or leaving the office after you merge your code, not checking or caring if the build is successful and if the pipeline is green with your changes. You will put off or expect someone else to run tests, tell you if anything was wrong, and forget to update the operational docs and playbooks. Metaphorically speaking, leaving some uneaten fish to rot and generate smells.

So, as managers/leaders and teammates, we need to hold each other accountable and manage the expectations that avoid smells and encourage good code hygiene. Ensure that all code is reviewed, delivered with great tests (don’t just tick a box that there are tests, review and evaluate them), that the build pipeline stays green, that system tests continue to run without failure or regression. That if issues are found, they are dealt with quickly. That code and systems are easy to maintain and quick to diagnose so that operational costs are low and customer issues and incident interrupts are very infrequent.

Take the time to change, and start to build good habits today!

When is Root Cause Analysis not a Root Cause Analysis?

When you stop at the Cause and don’t do the Analysis!

I have heard this or similar too many times, “we performed the root cause analysis and found that we were missing a test case, we have added that case now.” or “we found the root cause was this line of code, we have fixed it now.”

I heard similar again today.

Whenever you have a reason to want to perform a Root Cause Analysis, remind yourself to be grateful – a leader I admire referred to incidents as learning gifts – we learn the most and the most effectively from failures.

At my first workplace, we designed and manufactured Flight Deck Instrumentation; you know that stuff you see in cockpits that pilots rely on to fly planes and get you to your destinations without issue. Well, I will often say I have never worked on anything since that could directly cost human lives if it fails. So, I am always grateful for these learning gifts as they will help the teams involved learn how to be better, annoy or frustrate our customers less, and delight them more. (And not harm them in any way)

Which leads me to the intentions you need to have when performing a Root Cause Analysis, a quick reminder;

Our number one goal is to ensure we learn all we can to ensure we can prevent or at least catch future occurrences of any issues like this.
An RCA is never about blame.
We have the right people involved – Ensure you have folks representing each area of the product, service, or code involved in the issue.
We are focused on prevention – Ideally, you are gathering to perform the analysis and have already found and resolved the specific issue(s) – now you are focused on analyzing to understand how to prevent similar problems from occurring or going unnoticed in the future.

You will also need someone skilled and experienced at running these types of discussions. The outcomes will depend on the skills, experience, and knowledge of those performing the analysis. An experienced facilitator will know how to get the best out of the participants and call for more or more in-depth SME help when required.

For example, the 5-whys technique is a popular technique to use in an RCA. However, it can easily fail to produce the desired results as;

It is easy to stop at symptoms
You can get stuck not knowing what would be another good ‘why’ question to ask
Sometimes you follow the why questions down a path to a single root cause and are not open to multiple root causes.

So, the next time you have an issue or incident that you can learn from, set you and your team up for future success by;

Enlisting the help of a skilled facilitator
Getting the right people gathered together
Remind everyone that is a no-blame discussion
Ensure everyone is full of curiosity
All are in learning mode

Then, see what you can discover together to prevent future problems and lead more productive and less interrupted (by issues) work lives.

A previous post with an example

Beware the Streetlight effect in your feedback mechanisms

Be on the lookout for the Streetlight effect, especially when it comes to a choice between fast feedback and slower feedback.

I recently worked with a junior developer to add a simple test that could be run before code commit by any developer and as a test before the pull request was merged (if the developer forgot to run the tests in their branch before committing 🙂 ).

The intent being to provide fast and reliable feedback to whoever was making a change so that, if the test failed, then they were still in context and could very quickly and efficiently resolve the issue and commit passing code

I checked back on progress and found the junior developer had been advised to add the test to the CI/CD pipeline at the image build stage.

A senior developer, who would have been the one to help teach the junior developer how to add a test to the CI/CD pipeline, provided this advice.

Image build is the last stage of the pipeline. After several building stages, testing and packaging culminate in an image ready to be deployed into production.

Adding the test at this final stage would be trivial to add and run.

However, if the build failed at this stage, it could be hours or even days after the code change that caused this test to fail was made. The developer who made the offending code change would be in a different context now and would probably not be the person who sees the image build failure.

Now you have very slow feedback.
Another engineer has to diagnose/dig into the reason for the image build failure.
This engineer then needs to find who made the code change and track them down to correct it.
The offending code engineer may not even be available at that time.
Meanwhile, the build is stuck, and if you needed to get this build into production to fix a customer issue, you would be blocked and very unhappy!

So, when you have a choice between something hard (like finding the key you dropped in a dark place or adding a test to the earliest possible point of feedback) versus something easy (like looking for your dropped key under a streetlight or adding your test to the end of the pipeline). Be aware of your choice (not finding your key as this is not where you dropped it or getting your feedback very late and at a point when it will be expensive and more time consuming to fix).