How testable is your or your team’s code?

How can you design your code so that it is testable?

Are you struggling to test code because you did not write it with testing in mind?

If so, you and your team need to design your code with testing in mind.

We all need and want code that is easy to test to get faster, quicker, and cheaper feedback on our changes.

So, here are some tips and techniques to help improve the testability of your code;

First, what do we mean by testability?

A quick google will provide you with lots of definitions, but a simple starting point is; Testability is the degree to which and how easily you can test the software.

If the testability of the software is high, then testing should be easier and more efficient.

In other words, you can think of testability as “how easy is it to test?”

To gauge your software’s testability, consider how efficiently you can test it in terms of these two factors; the number of tests you need to run to be confident it functions as expected and does not exhibit any unexpected issues. And the level or layer at which these tests can be developed and executed.

A larger number of tests or the majority of your tests being at a high level (e.g., UI, E2E, system-level), as opposed to at the unit or integration level, would typically indicate that your code is not very testable.

So, how can we improve the testability of our code?

What are the three main factors?

  1. Observability – the extent to which we can see that the code functions as expected and not doing anything unexpected.
  2. Controllability – the extent to which you can control your software. Often by controlling the inputs, state, or data on which each component operates.
  3. Isolate-ability – the extent to which you can test your code in isolation. 

In more detail;


Can you observe the results of any decisions or state changes in your running code?

Good examples;
Code modules or functions that return clear state or status
Bad examples;
Code modules or functions that modify state or status but do not return or share anything observable with the calling code

What are you logging?

Good examples;
Effective use of logging levels for different information types, e.g., INFO, WARN, DEBUG, ERROR.
Bad examples;
Nothing logged
You are only logging errors.

Can you query state, status, or data easily?

Good examples;
State or status is stored or queryable at all times in a DB or via an API call.
Bad examples;
State or status is maintained only in the code’s running memory, requiring sophisticated runtime debugging tools to observe it.


Can you efficiently drive different data values into your code?

Good examples;
Easy to call your code component and pass the data you want it to use
Easy to provide data via a defined interface or data source, e.g., an API or DB.
Bad examples;
No ability to pass data into your code
Data is provided by a dependency that makes it hard or impossible to control.

Can you quickly change the current state or status of your code?

Good examples;
Easy to call your code component and pass the state you want it to have at any point in time
Easy to provide state via a defined interface or source, e.g., an API or DB
Bad examples;
No ability to pass state into your code or to set state before your code is executed
The state is provided by a dependency that makes it hard or impossible to control.

Can you control your inputs?

Good examples;
Easy to change inputs or to use mock, fake, or stubbed inputs, e.g., switch out the source of input between real/live and fake/controlled ones.
Bad examples;
Inputs are hardcoded or otherwise unchangeable.


Can you isolate your code components and test them on their own?

Good examples;
Your code component can be easily called or executed on its own. Its’ inputs controlled, outputs observed independently of other code components, such as a unit of code that can be called and passed data or state, and that returns data or state that can be easily verified.
Micro-services tend to be easy to test in isolation.
Bad examples;
Code contains many dependencies that are hard/impossible to control, meaning that it has to be run in an environment where the dependencies are met/running.
Code monoliths tend to be hard to test in isolation.

How easy is it for you to mock, fake, or stub out any dependencies for your code?

Good examples;
Easy to inject data via in-memory DB, file, or substitute a URL or other resource pointer for a fake one
Bad examples;
Hardcoded dependencies.

Other attributes which tend to impact testability include;

Separation of concerns – the extent to which each code component you want to test has a single, well-defined responsibility.

Can you easily understand or define what each code component does or is what its’ responsibility is?

Good examples;
Clear and typically singular responsibility for a piece of code
The state is changed, or a decision is made based on data or conditions assessed by the code.
Bad examples;
Code monoliths or code blocks in which you make multiple decisions or changes of state.

Understandability or readability – how easy is it to understand the intent of the code and how you achieve that intent.

Can you quickly understand what the code is doing?

Good examples;
Clean code, naming helps make intention(s) specific and meaningful,  commenting to explain any complex or non-obvious code.
Bad examples;
Obfuscated code using short naming that is not explicit or clear. Abstractions that don’t help readability or clarity Enumerations that do not help readability or clarity.


The key takeaway here is to think about how you will test your code BEFORE you write it. Doing so will help you ensure your code is; observable – so that you can see the result, state change, and data that occurs when your code executes.
Controllable – that you can easily control the input data or state for your code to use or work with
Isolatable – that you can easily verify your code in isolation, that you do not need to execute multiple dependencies together to test your code

I often find myself working with teams and code that is was not developed with testability in mind, and this means it is hard to add unit tests without first refactoring the code to make it more testable. It can be hard to know whether or not the code components behave as expected without using or employing sophisticated and often expensive tools or debuggers to observe it or control state or data during execution. Because it is difficult to isolate the code components, you often need to exercise the code and run tests in an integrated or almost production-like environment. Meaning that your feedback cycle is much longer, the time between you making the code change and getting any feedback on whether that change was right or not.


With reference to my original post on The Streetlight Effect; I still see this effect today.

Recently, I was discussing the testing of a mobile app with an experienced QA, who is a very experienced black box tester. I am currently encouraging and coaching this tester to learn more about programming, code design, code architecture etc as I think any knowledge in these areas can make you a more effective and efficient tester.

This tester was seeing some problems when interacting with the mobile app UI and was going to raise a bug with the developer. I asked this tester if they were observing the mobile app. requests and responses to and from the backend API, i.e. the requests and data being sent and received either across wifi or cellular data between the mobile app. and our cloud-based backend server API. The tester was not sure how to look at that traffic. The streetlight effect – I will look here (the UI) for my keys (any issues) because I know how to look here (the UI). So I shared a how-to I had written years ago for Charles Proxy which will allow the tester to inspect the https requests and contents along with timings and more, by proxying all requests made from, and responses to, the mobile app. through his laptop. This enabled the tester to see the requests that were failing as well as the resulting error message – the tester was then able to raise a much more detailed bug and the developer was able to get straight to the problem and fix it quickly. (As opposed to a bug like, when I do this in the app, I don’t get the UI screen/data I was expecting to see displayed. Where the developer would have to first reproduce the problem, watch the traffic themselves to pinpoint the problem – taking much longer and possibly with some back and forth to clarify reproduction steps etc.)

Another example is another very talented tester who is also a strong coder so has great white box testing skills. This tester was digging into some performance issues. Trying to understand why a request for a large dataset from an API was taking so long (when the system was otherwise quiescent), and would often fail if there was any other activity (API GETs and POSTs to requests for data and to store data). We are using AWS and there is a myriad of tools and monitoring capabilities to learn and get your head around. This tester was able to extract the time taken to complete the request for data and plot this against the size of the data extracted. If you think about this visually, this tester is looking at this from a black box perspective, making a request knowing what was requested and extracting the start time of the request from the log, then extracting the completion/response time from the same log, then plotting this against the size of data returned. (The tester was increasing the data stored and thus retrieved between each test/request). In this case, being capable of white box testing and understanding the code and system architecture this tester knew that there were several key components involved in servicing this request, but was not observing any of them. In order to understand what is really going on, and in this case be able to pinpoint the parts of the system that was taking a long time to service the request and which ones would fail when there was any other activity. Most performance issues are as a result of some form of resource exhaustion, e.g. CPU, Memory, Input-Output (IO), threads, connections etc. So, we really want to be able to see how these resources are being consumed when we interact with the system, as this can lead us to understand that our CPU usage is spiking to 100% when we do something and thus cannot do more when more requests come in, or that our memory spikes up and never comes back down to the pre-request level once the request is complete – in other words some memory is not being freed leading to a resource leakage (we will exhaust this resource over time). In this case, learning how to observe the individual service docker container resources and the database resources will likely lead us fairly quickly to where the problem(s) or weak link(s) in our data (chain) are for the request we are making.

In conclusion, what we need to do more often is to ask questions like;

  • What could I watch or monitor to see, in more detail, what is happening/going wrong?
  • What is the data flow – what path does the data takes through our system and what components are involved?
  • How is the system architected and how do all of the components communicate
  • How can I observe the communications between the system components?

If any of these questions result in a “I don’t know” or similar, then ask your colleagues for help, you are likely to learn something new, even if that something is that your colleagues also don’t know the answers to some of these questions.

A BDD worked example – login page

I have used this example as a workshop to introduce BDD to a wide variety of folks at different companies. I like this example as it is deceptively simple, everyone knows how to deliver a login page, right? The reality is that we all have different ideas about what should and shouldn’t be on a login page and how it should look etc. So it does serve as a simple but very illustrative example of how using a Behaviour Driven Design approach can really help to clarify requirements, and engage the thoughts, experiences, and knowledge of all the participants to ensure what you deliver will be what was really desired. Also, that it will be both testable and tested as the high-level acceptance tests are defined up front.

Introduce roles and abbreviations

First of all, I want to introduce the roles that will be part of the discussion along with the abbreviations for those roles used in this example. Each role can then speak their part as an example of how the discussion could go for this example.
  • PM – Product Manager, our proxy for the customer, bringing the ‘what the customer wants or needs’ definitions to the team
  • DL – Development Lead
  • IxD– Interaction Designer, bringing the UI look and feel, the usability and customer workflow understanding to the team. Helping to ensure we have a consistent style, content, and customer workflows.
  • QA– Quality Assurance person (either QAE or SET) who will be ensuring we deliver the story with high quality, building it right and building the right thing
  • Dev– Developer(s), responsible for the actual implementation of the story, the code that will provide the desired functionality.
  • Implementation team– typically composed of a developer and a QA person, but can include Interaction designer, development or QA pairs.
  • Amigos– the group of people required to analyse a story – typically the PM as the customer proxy and the implementation team

Introducing the Story

The Login Page
Bring the story into ‘In Analysis’
What does the story look like at this point? (This is an example using a tool called Mingle)

The BDD discussion begins

As is fairly typical at this stage, the story does not contain a lot of detail and is kind of vague in its description
We start the discussion:
PM or Dev Lead presents the story
The 4 amigos (PM + Implementation team (IxD, Dev, and QA)) discuss and ask clarifying questions to understand the story in detail, exposing and discussing any business risks as they go
PM/DL: This story is to deliver a login page. Fairly standard login page, username and password fields and a submit button. (The what). This will be the login page for our administrators. (The who). Once they login here they will have access to the dashboard and all the administrator functionality. (The why)
QA/Dev: Do we have a mockup?
IxD: Yep, looks like this (I encourage mockups to be cheap and for me, nothing beats a whiteboard diagram for cheap, flexible and efficient;
QA/Dev: So is the button text ‘login’ or ‘submit’?
IxD: I think ‘login’ is more intuitive
QA/Dev: Is it a username or an email address?
IxD: I was assuming it was an email address
PM: Yep, we will need to use an email address, we will want this to work with our single sign-on feature coming later and that will use an email address
QA: Can I assume we will use our standard code for validating an email address?
Dev: Erm, do we have a standard email address validation code?
QA: Yes, I believe the architecture team has a regular expression they standardised on
QA: Do we want to provide any client-side validation of the password? Or should we just send it to the server for validation against the username? i.e. should we ensure it is at least 8 characters long, contains at least one special character and at least one upper case character?
PM: No, we will have checked that when we set the password at admin user creation time or when they update it themselves. Let’s just have the server side validate it against the email address. Besides if we provide guidance on how a password will be composed then an attacker can update the dictionary they are using for brute forcing so that it follows the rules.
QA: How do we want to tell the user that either their email or password is not valid? Text on the page? Red? A popup? Do we want to clear the fields?
IxD: First I think we should have ghost text in the email address field to provide an example of a correctly formatted email address. For the error, I think we should have red text above both boxes, and we should leave the fields populated, let me provide an updated mockup;
QA: Do we want to show a different message for an invalid email address, i.e. one that fails email address validation rather than a check to see if that email address is a user in our system?
IxD: Yes, I think we should help the user to avoid typos, how about red text above the username field for this too. Here is an updated mockup;

QA: Does the error text and the text on the page need to be localised?
PM: Yes, we need to support the existing 14 languages for the Admin users
Dev: So, we should use the browser context to set the locale and display localised text if we support that locale and a fallback if we don’t?
PM: Yep, we might have a different setting later if we allow users to select a preferred locale that we store as part of their profile, but at login time we don’t know who they are so we should just use the browser locale.
QA: Cool, so the localised text will be supplied as part of the page render based on the initial request to the login page URL, including any text for error messages?
Dev: Yeah that’s the way we usually do it, so we can just send error codes back and the client side code can then render the appropriate message. Of course, the email address validation will be checked at both client and server so we can provide quick feedback to the user if they don’t provide a valid email address format in that field, but also guard against someone hitting the server directly with an invalid formatted email address.
QA: Nice, we should make sure we have unit tests covering the validation on both client and server then, I can add a single invalid test for each to show the error message (UI) and return code (server)
Dev: Should we have a ‘forgot password’ link and functionality?
PM: Yes, but I don’t think we have email functionality built yet, so we will defer that to a future story
QA: Should we have a timeout for responses from the server? i.e. how should we deal with the server being busy or unresponsive?
Dev: Yes, we should have a timeout value in the client code that will display a message to try again later, do we have a mockup for that?
IxD: Agreed, let’s allow 10 seconds for the timeout and I think we should show a message to try again later if we timeout or if we get a 50X back from the server. Here is a mockup for how that text should be displayed;
QA: Do we need to support logging in on mobile devices? i.e. should this page follow a responsive design pattern?
PM: Yes, we need to support tablets right now and may need to support phone devices in the future, if we go responsive now then both should work.
QA: But we will only need to test on tablets for now, right?
PM: Yep, we will add testing stories for phone device testing later if we need them.
IxD: Responsive should be easy enough, but I may need to think about the length of the fields and the text we will need to display, particularly in different languages.
QA: What about accessibility? Do we need to support a WCAG level for this?
PM: Hmm, well we should but I think we will defer that to a future story. Let’s try to keep it in mind so that we don’t have to re-design later
Dev: What about functionality to enable maintenance notifications? i.e. the ability to add text to inform admins of upcoming maintenance or outages?
PM: Again, I want to defer that to a future story, I will sync up with Production IT to understand the requirements for that
QA: Do we need to limit the number of attempts to login so we can avoid brute force security attacks?
PM: Hmm, yes I think we should allow 3 attempts and then lock the account for maybe 5 minutes?
IxD: Actually I think 5 attempts would be better
PM: OK let’s go with 5 attempts and 5 minutes wait time
Dev: Do we want to log each and every login attempt (both successful and unsuccessful) or just the ones that result in a lock on the account?
PM: Hmm, I think we may need to log all attempts along with success, failure or lock so that we can provide an audit log if the customer needs it or if we need to show anything for a security audit.
QA: How do we want to show the lock message when 5 unsuccessful attempts have been made?
IxD: I am thinking red text again, but below the 2 boxes and to the left of the button this time, here is an updated mockup;
QA: How are we determining 5 attempts to login? Attempts using the same email address? Coming from the same IP? Some form of session identifier e.g. cookie?
Dev: Well the simplest is to set a session id in a cookie when an attempt is made on the server side and then to count how many attempts are made with this session id
QA: I presume that means someone malicious could simply brute force by creating ‘cookie-less’ requests?
Dev: Yeah, maybe we need to think about that one some more or talk to the security team.
QA: What should happen if the user attempts another login when we have locked them out?


QA: So based on all of that, what do we need as examples to accept this story with? I am thinking something like this;
Given a valid email address and password when I select login then I should be authenticated and taken to the dashboard page
Given an invalid email address or password when I select login then  should see a message indicating that my login attempt was unsuccessful
Given a badly formatted email address when I focus outside of the email address text field then I should see a message indicating that I have entered an incorrectly formatted email address
Given I am entering an invalid email address or password for the 5th time, when I select login then I should see a message indicating that I must wait 5 minutes before trying to login again
Dev: Should we include an AT for the server busy/down error message too?
QA: Is that required for acceptance? It is not something the user is in control of or can directly impact (without removing their network connection)
PM: I agree, we should test for it but I don’t think we need to include that in the Acceptance Tests
PM: I want to be sure this will look and work well on a tablet, so can we make sure we test that?
QA: Sure, we can do desk checks on an iPad if you like? But we will automate the ATs using Selenium and test with our most popular customer browsers for the admin interface
PM: Great
IxD: I am a bit concerned with how the localised text will look, can we make sure we test that too?
QA: Sure, we will test that the locale gets set and fallback as expected and we will do some basic checks with pseudo loc to make sure we don’t have overlap, truncation etc, but how about we include some different languages in the desk checks and with iPad to make sure you are ok with the the look and feel?
IxD: Sounds good
Dev: What about an AT for the audit logging?
PM: This is not a formal requirement from our customers or security team yet, so I want it tested but it does not need to be an AT as it is not a must-have part of the specification.
QA: No need for any regression tests here as this is all new code and not dependent on anything else.
PM: Do you need a new environment to test with?
QA: We already have the pipeline setup so we can just deploy to that from the CI system and test there, so no, we should be ok.
PM: What do we think are the biggest risks?
QA: Well I think security is the biggest as this is a login page, but we will mitigate that with validation in the client and server side plus testing focused on circumventing security, including the lockout to prevent brute forcing. The next biggest is email validation, this is notoriously problematic as most email clients do not conform to the RFCs. We will mitigate this by using our standard email validation to be consistent and to have one place to change if customers complain. We can also monitor the audit logs to see if people regularly use different methods of commenting or ‘tagging’ their emails that we should allow for. I am not really concerned about performance (very little traffic between server and client) and we will do regular desk checks for usability and style including the error messages, localised text, and responsive design.
What does the story card look like now?
QA: We need to get more specific with our examples, we have captured the top level ideas and behaviours but we really want to provide examples (or specification by example) so that it is 100% clear how this will behave and so we know how we will demonstrate this to you for acceptance, so how about;
Given I have entered [email protected] and Password1 (valid pass) as the password
When I login
Then I should be presented with the dashboard page
Given I have entered [email protected] and Password1
When I login
Then I am presented with an error message in red text saying “Invalid email or password”
Given I have entered as the email address
When I change focus from the email field
Then I am presented with an error message in red text saying “invalid email address”
Given I am entering invalid credentials for the 5th time in a row
When I login
Then I am presented with an error message in red text saying “Too many failed login attempts, please wait 5 minutes before trying to login again”
PM: OK, we know the scope now and we have Acceptance Tests defined, so what do we think is the size of this story is?
At this point, we have clarified and agreed on the scope, and have a common understanding of what ‘done’ looks like in the form of some high-level acceptance tests. It is reasonable to guesstimate the size of the story at this point. But note we are much more likely to be able to guess more accurately once we have talked through the design in a bit more detail – the how we will solve this need in code discussion.

Testing vs Checking

There has been a lot of discussion over the last couple of years about test automation and in particular the varying definitions of testing vs checking and how that applies to test automation.

I broadly agree there is a difference, here is my paraphrased understanding of each definition;

Testing – the art and science of conducting experiments and carefully observing the results, all the while making multiple evaluations against explicit and implicit expectations. A fundamentally human, (or manual if you prefer), exercise.

Checking – the deterministic evaluation of the outcome of an action or step such that a pass or fail is recorded.

But there seems to be an underlying theme to most of these discussions, almost a fear. It is as if someone has threatened the existence of manual or human testing.

I do agree that there has been a general drive towards more automation of ‘tests’, and that this has been largely associated with the adoption of agile practices. I myself have encouraged, and in some cases demanded, more investment in, and thus more, automation of tests in companies I have worked for. However, I have also encouraged and hired for manual testing, and have coached and mentored folks to be better exploratory testers (what I call brain engaged testing).
So I don’t subscribe to the fear that manual testing is a thing of the past or an unnecessary overhead. Perhaps this is why I don’t share in the what seems to be an attempt at a sharp delineation between automation and testing?

Like Michael Bolton, I do see automation as a tool and as something that supports testing.
I often use the phrase automation assisted testing, to refer to exploratory or other manual testing where the test setup or initial test data has been achieved using automated tools or scripts.

My preference is to develop automation code in a re-usable fashion, producing a library of re-usable code that is easy to ‘glue’ together in different ways such that different automated tests (or checks if you will) are achievable quickly and efficiently. But this approach also lends itself well to re-using these library ‘functions’ to assist with manual testing. If developed well then anyone with fairly basic coding skills should be able to combine some of these together in order to ‘drive’ a system under test to the point where you want to start your exploration or manual testing. Or as mentioned before, to prime the system under test with the exact data you want or need, in order to conduct the exploratory or manual testing you wish to execute next.

Layers of Test Automation

Also referred to as the Test Automation Pyramid

The intention of this post is to get across the idea that your testing strategy should include many layers of testing.

I am talking mostly about automation here and will for the purposes of this post I will ignore the discussion around testing vs checking when it comes to automation, and therefore will continue to use the common terms; tests and test automation.

My first introduction to the formal concept of the ideal test automation pyramid was courtesy of Mike Cohn of Mountain Goat Software (I read his blog post on this many years ago).
The idea he discussed resonated so well with me that I have been trying to follow this strategy ever since. Of course I have experienced a few different companies with very different shapes to their automated testing. I intend to share some of those experiences with you, along with some ideas for how to adjust your strategy in each of those cases, and of course to help you avoid the mistake that Mike was referring to of forgetting about the middle layer.

The test automation pyramid concept has been adopted quite broadly and adapted for many different scenarios too. But it is definitely not a silver bullet and there are times when this approach is not appropriate for your environment, technology or simply the way you work.
That said, most of the companies, technology stacks and teams that I have worked with can and have benefited from this strategy.

So, what is it?
Well here is the most basic version of the pyramid that I typically draw on a whiteboard;


One of the variants that I will often draw, when I feel the need to point out that we still need to do manual testing, (preferably exploratory), is shown below. Because this manual testing is somewhat variable in size or content I add it as a cloud to the top of the pyramid. There are many others who use this style (I don’t claim to have been the first but I cannot remember where I initially saw this in order to provide appropriate credit).

Test Pyramid with Manual Test Cloud

But the variant I use most often is one where I split the integration section in two, and talk about code component integration and system component integration;

Test Pyramid with 2 Integration Layers

Having done that I feel that I really need to explain my layers more clearly;

Unit tests – tests that are designed to ensure the smallest divisible pieces of code (units or components) are working the way they were intended. These are typically written by developers (though I encourage QA folks with development skills to at least review if not write some of them). They are typically written to make use of a unit test framework. They are often written after the code that they are intended to test is written, though in most cases I would prefer them to be written first (in a TDD manner). They should be executable by a developer at any time and are typically the first tests run in a CI system (Continuous Integration System).
A web based application may have unit tests in more than one code base, for example you may have Javascript Unit Tests in addition to those in the back end or server side code or even API code.

Integration tests at the code component level – tests that are designed to ensure that the code units or code components that need to work with each other (one calls another, passes data onto another etc), do so in the expected way(s). These are typically written by developers (though again I encourage QA folks with development skills to review and perhaps add tests here too). These will also often make use of a unit test framework but will be typically run after the unit tests have run (and passed).

Integration tests at the system component level – tests that are designed to ensure that the system components that need to interact with each other can do so as intended. These may be written either by developers or QA folks with programming skills. These tests will be designed and executed against APIs or Windows services or any interfaces exposed between system components. Sometimes you may have 3rd party services or components involved in this layer, for example we are currently using some cloud based services in our application. Often the UI will be built on top of an API, and by focusing on testing at this layer you can more efficiently and more robustly test the variations and permutations of API calls. Thus providing a solid, (well tested or checked), API layer upon which to provide a much smaller set of UI tests, as these will just need to prove that the UI interacts as expected with all the code layers below, and that in turn they all interact together well, (you will have covered the broad variations in this in the layer below too). These tests will need to be run against a deployed build in the CI pipeline, as these will typically need the application to be installed/deployed in an environment similar to way it will be delivered in production. As such these are normally run after the code component integration tests have run and passed.

UI tests – tests that are designed to ensure the user interface works in the way that was intended. Keep in mind that the user interface is not necessarily a web page or a GUI, it could just as easily be a command line interface to a tool. Typically though we are talking about a web based UI or a GUI of some kind. Test automation at this layer is often expensive both to produce and to maintain over time. So the focus here should be to minimise these automated tests by relying on and building on the successes of the testing in the layers below. Focus here on simple end to end workflow through the UI, and ensure your tests focus only on the sections of the UI that you want to prove are working well. In other words utilise lower levels of testing to prime the system under test with appropriate test data etc. For example: using the API test code to enter test data or get the system into a certain state that you need to start testing a UI workflow from. These are normally the last tests run in the CI system and sometimes are not run in a continuous way at all. For example if your UI tests take 4 hours to run then you won’t usually be able to run them on every check in and will instead need to consider running them periodically say once or twice a day.
(We can talk about opportunities to reduce this time later but the best one is to simply reduce the number of tests you need to run at this level by ensuring you have most of the coverage you need in lower levels).

So, why are the layers ordered and sized the way they are?
Well, I typically think of the width of each layer being the number of tests. This provides a relatively easy way to measure to see if you are approximating the right shape. As with most metrics I would caution you using this too strictly as really you just want to see that you are trending the right way or are in a position to discuss why not, (and perhaps understand that you have valid reasons).
The reason they are layered in this order is really the building analogy, where the bottom layer of unit tests is really forming the foundational layer of tests for the rest to be built on. You want a very broad bottom layer (a large numbers of unit tests as the scope of each test is very small but the permutations and variations you need to cover may be broad).
As you move up the pyramid you will need less tests as the type of test increases in scope (covers more with one test) and because you don’t need to cover all permutations or variations as most should have been covered in the layer below.
You will have noticed in the definitions that I mention CI systems and when the tests will typically run. This is following the same pattern, you will only run the tests of a higher layer once the tests that are providing a foundation for that layer have run and passed. If there are failures you typically want to stop and resolve those issues before moving on.
It is also worth mentioning that the lower the layer the ‘cheaper’ the testing is, e.g. unit tests are typically quick to write, and very fast to run. So having lots of permutations or variations of say data or parameters tested at this layer is relatively cheap and easy, (much cheaper than at the UI layer). Thus this layer can provide a very solid foundation for the higher layers where you may only need to test one or two permutations or variations of data or parameters as you will know that the rest have already been covered and that the higher level test is more focused on proving interaction between parts of the system or the system as a whole.

One of my previous colleagues (Caroline), always preferred to think of the layers of testing as layers of a multi-tiered cake, like a wedding cake.
I prefer the pyramid shape myself so I continue to use that as my illustration.

Here are some of the shapes I have experienced and some approaches we have used to improve the situation:

Single Layer

That said One company I worked at did not really have a pyramid at all, it was more like a unit test cake with a manual smoke test cherry on top
This was a very developer heavy company where developers were expected to deliver production ready code, so they were expected to test their own code. Which typically meant they wrote unit tests and not much more.
If the code compiled and could be installed then it was largely assumed to be good.
The unit testing was not, in my humble opinion, great or consistently applied. The usual patterns and problems of some developers doing a better job than others and no or very little measurement of coverage.
The tests were also typically written after the code (so not TDD), meaning that the tests typically just confirm that the code does what the developer wrote the code to do, and are not trying to ensure that the solution in code is a robust one that will handle interesting or unusual cases appropriately.

If you find yourself in this situation and you have quality problems, (if this is working for you then no need to fix it), then I would suggest you try to find examples of product failures that are as a result of failures in system component level integration or code component level integration. For example an API that was accepting invalid input from the UI and failing as a result. Use these to encourage the developers to add integration tests, by helping them to understand the missing tests (the ones that could have exposed these issues early).
You will also need to seek management support to ensure new code written has code and system component level tests delivered with it as well as the unit tests. It should be fairly easy to monitor and show that this is happening and provide feedback on some of the issues these extra tests are exposing.
Once you start seeing automated tests running and passing at the code and system component levels you can then start to add UI level tests (probably best to start by automating those smoke tests).

Inverted pyramid

A common scenario, (in my experience, and the experiences that others have shared with me), is an upside down or inverted pyramid, where the testers have focused on adding automation at the UI layer, with very little being done at the lower layers. There may have been some automation focused on service or API layers. The developers have not been encouraged or managed to producing much in the way of unit tests so this is the smallest of all the layers.
Sometimes this happens when an organisation purchases an expensive test automation tool and wants to see a return on that investment, so focuses or manages the team to that, resulting in lots of UI centric automated tests.

Inverted Pyramid

The way to ‘flip’ the pyramid in this situation is to set the expectation that all new code needs to have unit tests delivered along with it, and any existing/legacy code that is changed should also have unit tests added (where this is possible and cheap enough to do – code that was not written with unit tests in mind can sometimes be very hard to add unit tests for. If new code is written using a TDD approach or at least with the intention that unit tests need to be delivered too then it will be more ‘unit testable’ by design). Again you will need management support or buy in for this, as some may question the value of the extra time or investment required in providing these tests. Try to find some existing issues that could have been easily and cheaply exposed at this layer, or pay attention to those that are exposed by your new tests and celebrate them. Assuming, that you see unit tests being added and passing, then you can start to encourage code and system component level tests by looking at important interactions in both those layers and focusing on those first (critical components at both code and system levels). You should also look at your UI tests and see if these can start to be refactored to either use more API or service level integration or perhaps even be replaced by tests at that layer.

Trapezoidal pyramid

An interesting variation is one that I call trapezoidal, since it feels like there are two trapezoidal sections of tests with a thin and narrow band of integration tests in between, in extreme cases perhaps none at all. So this is really depicting a reasonable amount of unit tests along with a focus on UI tests and very few integration tests of any sort. This, I feel, is the very problem that Mike was focused on with his original blog post, and it is a shame that this is still a pattern we can see today.

Trapezoidal Pyramid

For this company it seemed the automated testing was divided between dev and QA in terms of, “is it a unit test?” then dev will provide that, anything else must be a QA test and that typically means testing as the customer so a UI test.

There are many reasons why we seem to ignore the integration tests, here are a couple of the most common ones I have witnessed;

  1. How many people can adequately understand, and thus define or explain what an integration test is? (In my experience not many).
  2. Even if the team does have a good understanding it seems to be something of a “no man’s land” in that it is not always clear who should own it so it simply doesn’t get owned and thus doesn’t get done

So in order to combat this shape, the team(s) really need to focus on adding integration tests, both at the code component and the system component levels. This will again require investment and support, so as before try to identify the critical code components and system components and focus your efforts on these first, or simply start applying this with all new code and only tackle existing code if it is changing significantly. Once you see these system and code components being covered more efficiently and effectively using integration tests you can probably reduce the number of UI test variations that interact with these system and code components. It may be possible to divide the efforts here neatly between QA folks with programming experience who can tackle and get the benefit of a greater understanding of the system component integration points, and the developers who can more readily identify and develop tests for the critical code components.
Make sure your tests are identifiable in some way so that progress can be shown and measured and that issues found can be attributed to the appropriate layer in which they were found so that these successes can be shared and celebrated, providing validation of the efforts it took to add these.