Series: QA Leadership · Article 6 of 9

Picture two teams. Both report: "This quarter we had 3 bugs in production." One deserves a bonus for it, the other should stop shipping immediately and run a deep retrospective. Where is the difference? In the number of releases - the one metric neither of them put in the report.

3
production bugs · 2 releases
On average 1.5 bugs per release. Every second deployment hands the customer problems.
Crisis
vs
3
production bugs · 15 releases
Just 0.2 bugs per release. The vast majority of releases ship without the slightest issue.
Success

The same absolute number can describe two completely different realities. That is exactly why Number of Releases is not a metric you simply present - it is the foundation through which you judge every other data point.

It is the most overlooked of the five metrics in this series, because on the surface it seems trivial (“Just count how many releases we shipped”). Its role, however, is critical. Without it, indicators like DDR, Escaped Bugs or Issues per Release are merely dry numbers, stripped of any real scale.

The metric that does not shine on its own - but lights up the rest

Picture the other four metrics in the series as satellites. Each one orbits a single reference point - the number of releases. Without that center, each one drifts without context.

Common denominator
Number of Releases
gives scale to each of the metrics below
Escaped Bugs
escaped ÷ releases
= escaped per release
→ real customer impact
Issues per Release
issues ÷ releases
= maturity per release
→ comparability over time
DDR
context: how many chances
to detect / escape
→ scale of the process
Confidence Score
trend across N releases
= prediction stability
→ repeatability
An absolute number tells you how much happened. A normalized number tells you whether that is a lot. And "is that a lot" is the only question the business truly cares about.

How to normalize every metric in the series

Normalization is simply dividing the absolute number by the number of releases. But the effect is transformative - it turns a number that swings with your pace of work into a quality indicator independent of that pace.

MetricAbsoluteNormalizedWhat you gain
Escaped Bugs12 / quarter÷ releasesComparability across quarters with a different cadence
Issues found96 / quarter÷ releasesCode-maturity trend independent of deployment count
QA time320h / quarter÷ releasesCost of quality per release - an argument for budget
Hotfixes8 / quarter÷ releasesStability of the release process, not a raw failure count

A concrete example - the same team, two quarters

Watch how normalization completely flips the conclusion. Without it, Q4 looks worse than Q3. With it - you see a clear improvement.

Escaped bugs - absolute vs normalized
Absolute numbers rise (more releases). Per release - they fall. Which conclusion is true?
Q3 vs Q4
Escaped total (count) Escaped per release

Reading the absolute numbers: “The number of production bugs rose from 7 to 12 - our quality is dropping.” Reading them after normalization: “Bugs per release fell from 1.4 to 0.8. We doubled our delivery pace and the quality of our software clearly improved.” The second conclusion is the true one. The first is a business trap.

Number of Releases vs Deployment Frequency

The number of releases is a very close cousin of the industry’s most famous delivery-pace metric - Deployment Frequency from the DORA research. It is the first of the four key DevOps metrics and a direct indicator of how often your organization actually delivers value to users.

Keep in mind that the latest DORA report moved away from the classic four-tier split in favor of seven archetypes, but the data in the classic layout still makes an excellent reference point. Only about 16% of teams can deploy changes “on demand”, while 24% do so less than once a month. The maturity gap is enormous.

Top tier
On demand
Multiple times a day, small code batches · ~16% of teams
High
Daily - weekly
At least once a week, often more
Medium
Weekly - monthly
Sprint cycles, end of sprint every 1-2 weeks
Low
Less than monthly
Large batches, high risk on every deployment · ~24% of teams

Why does this matter for QA? Because delivery pace and quality are not opposites - that is one of the most important findings from years of DORA research. The fastest teams are also the most stable. More frequent, smaller releases mean a smaller blast radius for every change, easier diagnosis and faster rollback. The number of releases is not just the denominator for your metrics - it is a signal of how mature the whole process is.

A rising number of releases alongside a falling escaped-per-release is the strongest evidence QA can present: we ship faster and more safely at the same time. That is precisely what DORA calls an elite-performance trait.

Deployment ≠ Release - and why it changes the counting

Before you start collecting data, you have to be clear: are you counting deployments or releases? They are not the same thing, and it trips up even experienced teams - especially those working with feature flags.

Deployment
🚀 Deployment
A technical act. Code lands on the production environment, but it can stay hidden from the user - e.g. behind a disabled flag.
Example: a new feature's code is deployed, but the flag is off
Release
🎁 Release
The business moment of making a new feature available to users. It can happen weeks after the deployment - e.g. by simply turning the flag on for 100% of the user base.
Example: turning the flag on for 100% of users

For this series’ QA metrics, we count what reaches the user, that is releases in the business sense. An escaped bug is a problem the customer felt - so the denominator has to be the number of moments at which anything could have reached the customer. If your team separates deployment from release with feature flags, decide clearly: is an escaped bug counted from the moment the code is deployed, or from the moment the flag is turned on? Consistency in this definition is critical - just as with escaped bugs in article 3.

Normalization calculator

Enter the absolute numbers and the number of releases - the calculator will show the normalized values with a verdict for each metric.

QA metrics normalizer
See how the number of releases changes the interpretation of your data
Escaped / release
0.60
Good
Issues / release
8.0
Needs work
QA time / release
24h
Cost of quality

How to start counting - and do it right

It is the easiest metric in the whole series to collect - but it has a few definitional traps worth settling from the start.

1
Decide what counts as a "release"
Deployment or availability to the user? A hotfix - does it count as a separate release? A rollback and re-deploy - one release or two? Write the definition down and stick to it. For quality metrics I recommend counting what reaches the user.
2
Pull the data from what you already have
Git tags, CI/CD history (Jenkins, GitHub Actions, GitLab), the changelog, the version list in Jira. The number of releases is one of the most readily available data points in the whole series - usually counting the production tags is enough.
3
Add the number of releases as context to EVERY report
This is the heart of it. Never report escaped bugs, issues or DDR without the number of releases beside them. One sentence - "across 10 releases this quarter" - changes the interpretation of every other number.
4
Normalize all the other metrics - and show both views
Show both the absolute number and the normalized one. The absolute one speaks to the scale of work, the normalized one to quality. Together they give the full picture - and protect you from wrong conclusions in either direction.

Three pitfalls when using the number of releases

01
The number of releases as a goal in itself
More releases is not the goal - it is the means. If a team starts artificially splitting one release into five to "improve" the normalized metrics, that is gaming the system. The number of releases should reflect the real rhythm of delivering value, not be optimized for prettier charts.
02
Comparing teams with different delivery models
A team deploying on demand and a team releasing once a sprint are two different worlds. Normalized metrics help, but they do not erase contextual differences - regulation, product type, architecture. Use normalization to compare a single team over time, not to rank teams against one another.
03
Ignoring release size
10 small releases are not the same as 10 big ones. The number alone does not account for size. For more precise normalization, consider weighting by story points or the number of changes - especially when releases differ wildly in scale. The number of releases is a good default denominator, but not a perfect one for every case.

The number of releases in a conversation with the business

Sprint Review "This quarter we shipped 12 releases - 4 more than the previous one. Despite such a big jump in pace, the bug-per-release rate dropped from 1.0 to 0.5. We are shipping faster and more safely."
1:1 with EM "The absolute number of production bugs went up because we doubled the number of releases. But if we look at the per-release rate, our quality improved significantly. We are achieving exactly what the DORA research calls elite performance: pace and stability rising at the same time."
Leadership "We increased deployment frequency from 3 to 12 releases per quarter, moving up a tier on the DORA benchmarks. More importantly, the defect rate per release fell by half. We deliver business value faster while drastically lowering risk."

Why this “trivial” metric is the foundation

Number of Releases gives you
  • A common denominator for all the other metrics in the series
  • Protection against wrong conclusions drawn from absolute numbers
  • Comparability across quarters with a different cadence
  • A bridge to Deployment Frequency - the language of DORA and the boardroom
  • Proof that pace and quality can rise at the same time
Number of Releases is not
  • A goal in itself - more does not always mean better
  • A measure of release size (the count alone ignores scale)
  • A tool for ranking different teams
  • Sufficient on its own - it only shines together with the other metrics
Number of Releases is a metric you do not present - it is the metric through which you present all the others. The quietest hero of the entire series.

In the next article

Five metrics behind us. Article seven ties them all into a single decision indicator - the Release Confidence Score. Three calculation models, from a simple traffic light to a weighted model, a step-by-step rollout and examples from practice. This is the moment the whole series starts working as a system - a single number that answers the business’s most important question: can we release?

Series: QA metrics the business wants to hear
  • 01
    Diagnosis, three pillars, five metrics, the QA → KPI mapping model
  • 02
    Formula, thresholds, historical data, seasonality, pitfalls
  • 03
    Taxonomy, data collection, the cost of each type, how to report
  • 04
    Rollout from scratch, the link to the development process, the EM conversation
  • 05
    Pinpointing problems, not just watching trends
  • 06
    Number of Releases you are here
    The context metric, normalization, the link to Deployment Frequency
  • 07
    Release Confidence Score step by step
    Three calculation models, rollout, examples from practice
  • 08
    Storytelling with metrics - building a narrative
    How to turn a table of numbers into a business argument
  • 09
    3 anti-patterns that destroy QA credibility
    Too many metrics, no context, jargon - and how to avoid each