Defect Detection Ratio - how to measure effectiveness before anything reaches production

Series: QA Leadership · Article 2 of 9

You walk into a 1:1 with your Engineering Manager. One question lands: "How many of those bugs do you catch before they reach the customer?" - and the conversation starts to fall apart. Not because you test badly. Because you don't have one single number.

EM"Listen - how effective are you actually at this testing? How many of those bugs do you catch before they reach the customer?"

QA"Well... we found 47 bugs this sprint."

EM"Yes, but how many escaped?"

QA"Uh... twenty two."

EM"So half of them bypass you?"

QA"Well... not exactly, because those were smaller..."

And the conversation falls apart. Not because you test badly. Because you don’t have a number that answers that question directly. One. Concrete. Ready.

That number exists. It’s called Defect Detection Ratio - and it’s the topic of this article.

What DDR is - and what it isn’t

Defect Detection Ratio is the share of defects caught by QA before reaching production, against all defects found in total - both before and after release. In other words: out of all the problems that ultimately surfaced - how many did you catch yourselves, before the customer saw them?

This is a metric of testing process effectiveness. Not activity. It answers the question: how well do we work as a filter before production?

DDR asks something fundamentally different from pass rate or coverage: does your testing process actually catch what matters?

DDR is not the same as pass rate. You can have a 99% pass rate and a 50% DDR - if your tests don’t cover the areas where the bugs live.

DDR is not the same as coverage. You can hit 90% of the code and not check a single critical business scenario. Touching is not the same as verifying.

From simple to advanced formula

Basic version

Basic formula

DDR = Pre-release bugs ÷ (Pre-release + Post-release)

DDR = 40 ÷ (40 + 10) = 40 ÷ 50 = 80%

Eight out of ten problems caught before the customer. One out of five escaped. That's your starting point.

Weighted version

The basic formula treats every bug equally. But a payments-blocking bug weighs more than a typo in a tooltip. It’s worth extending the formula with weights.

Weighted formula

DDR(weighted) = Σ(weight × bugs_pre) ÷ Σ(weight × bugs_pre + weight × bugs_post)

Start with the basic version. Introduce weights once you have a stable measurement rhythm and historical data.

Priority	Pre	Post	Weight	Weighted pre	Weighted post
Critical	2	3	×4	8	12
High	8	5	×2	16	10
Medium	20	2	×1	20	2
Low	10	0	×0.5	5	0
Sum	40	10	-	49	24

DDR(weighted) = 49 ÷ (49 + 24) = 49 ÷ 73 = 67%

Look at the result. The basic DDR was 80% - and it looked good. The weighted one is 67% - and it reveals that most critical bugs were escaping to production. That's a completely different story. And that's the story worth telling.

DDR calculator

Enter your numbers and check the result. Toggle weighted mode to factor in bug criticality.

Calculate your DDR

Basic or weighted - your choice

Bugs found pre-release

Bugs found post-release (escaped)

Weighted mode (factor in bug priorities)

Critical ×4

pre

post

High ×2

pre

post

Medium ×1

pre

post

Low ×0.5

pre

post

80%

Defect Detection Ratio

Solid process - what's hiding in the few that escape?

40 ÷ (40 + 10) = 80.0%

How to read the score - thresholds and context

DDR is not absolute truth. It’s an indicator - and like every indicator, it requires interpretation. But certain industry thresholds are worth knowing as a reference point.

70%alarm threshold

85%good threshold

95%excellent threshold

100%

Below 70%

Alarm signal. More than 3 in 10 bugs reach production. Investigate causes.

70-85%

Average level. A good starting point. There's room to grow.

85-95%

Solid process. The question: what's hiding in those few percent that escape?

Above 95%

Excellent score - but check if the data is complete. High DDR can be an artifact of incomplete data.

Industry context matters. In financial and medical systems 90%+ is a minimum, not an aspiration. In a fast-iterating startup, 80% at high release frequency may be a conscious, acceptable tradeoff.

Why you can’t start today - historical data

One of the most common mistakes when rolling out DDR: the team starts measuring from the current sprint and after a month has one data point. One. From which no conclusion can be drawn.

DDR without history is like a map without a scale. You know you're somewhere - but you don't know which direction you're heading and how fast.

Before you start measuring “from now”, do something much more valuable: reconstruct data backwards. Most organizations have all the data they need - nobody has just connected it in this specific way yet.

Where to find historical data

Jira / tracker

Bug history with date and environment. Export to CSV + JQL by date and type.

Main source

Support tickets

Freshdesk, Zendesk, ServiceNow. Here live the problems that never made it to Jira.

Supplement

Monitoring / alerts

PagerDuty, Datadog, Grafana. Incidents with exact timestamps.

Supplement

Deployment history

Git tags, CI/CD pipeline, changelog. When each release shipped.

Context

project = MYAPP AND issuetype = Bug AND created >= "2025-01-01"
ORDER BY created ASC

Seasonality and patterns

With 12 months of data, you start seeing patterns your intuition won’t catch.

Release seasonality

Release peaks before Q4, Black Friday, year-end. Knowing the rhythm - you plan testing capacity ahead, not putting out fires.

Turnover and onboarding

A new QA catches fewer issues than a senior for the first two months. Without data you don't know if a DDR drop is a process problem or an onboarding effect.

Feature type

New integrations, big refactors, new modules - DDR drops with specific change types. You can predict and direct testing effort.

First-release pattern

The first deployment of the month statistically has more escaped bugs. Accumulated changes + production drift from the test state.

Minimum viable approach - how to collect the data practically

Export bugs from Jira to CSV

You need: ID, created date, environment (test/staging/prod), priority. JQL above + export.

Build a release table

Date + version number. If you don't have it collected - git tags or CI/CD history will give it to you.

Assign each bug to a release

Bug created between release A and B → pre-release for B. Bug after B and before C, reported through monitoring → escaped from B.

Calculate DDR per release and draw the chart

4-6 releases in one table. You have history, trend, and first patterns. This exercise takes 2-4 hours. Worth every minute.

Case study - from 74% to 94% in four quarters

A seven-person team (5 devs, QA, automation engineer), SaaS platform for enterprise customers. At the start of the year DDR 74% - three in ten bugs reach production.

DDR - trend across four quarters

Each quarter: one concrete process change

+20 pp. in a year

DDR (%) Process change

Starting point

Diagnosis - before changing anything, they measured 74%

An analysis of 6 months of history surfaced 3 clusters of escapees: payments API integration, reporting module edge cases, post-deployment configuration errors. Unit tests - beautiful. But none touched those areas.

First intervention

Contract tests + E2E expansion 84%

Contract tests rolled out for the payments API, E2E expanded with reporting module scenarios. A 10pp jump in one quarter - just from knowing where they weren't testing.

Process change

New definition of "done" 90%

No feature enters QA without a minimal set of integration tests written by the developer. QA stopped being the gatekeeper at the end - it became a partner throughout the sprint.

Full picture

Monitoring incidents counted in the denominator 94%

A seemingly small change - support and monitoring incidents added to "post-release bugs". The number went up, but DDR held - because pre-release was growing in parallel. Now they had a full, credible picture.

"Every 5 percentage points of DDR is on average 4 fewer escaped bugs per quarter, at 8 hours each - that's 32 senior hours. Per quarter." - Budget approved.

When DDR lies - three traps

Every metric has weaknesses. DDR has three specific ones - and it’s worth knowing them before you start trusting it blindly.

Incomplete "post-release" definition

If the counter only includes Jira tickets marked by QA - you underestimate escaped defects. What about support incidents? Monitoring alerts? Splunk errors? Incomplete denominator = inflated DDR = false excellence.

Code bugs ≠ all problems

Bad production config. A broken integration. A wrong feature flag. None of them is a "code bug" - but each one hit customers. If DDR measures only code defects - you're not measuring the whole risk. (More in article 3: Escaped Bugs & Problems.)

High DDR, but only on trivial bugs

You can have DDR 95% and regularly ship critical bugs - if your tests are great at catching typos but weak on critical business paths. That's why you should always pair DDR with priority distribution. If your 95% is mostly Medium and Low - go back to the weighted formula.

DDR in business hands - the most dangerous trap

You won’t find this trap in the ISTQB syllabus. And it’s the most dangerous, because it touches not the measurement method but how it’s interpreted by people who don’t know the context.

Picture this: you show your Product Owner DDR 94%. They’re happy. They say: “great, we’re safe, we’re shipping.” But they don’t know that in the same quarter the number of releases went from 3 to 10.

Quarter	DDR	Releases	Escaped / Release	Escaped total
Q1	88%	3	2.4	7
Q2	90%	5	2.1	10
Q3	92%	8	1.8	14
Q4	94%	10	1.2	12

DDR rises across all four quarters. Looks great. But the absolute count of escaped bugs grew through Q1-Q3. For three quarters the customer experienced more problems in production - despite rising DDR.

⚠️ High DDR without context gives a false sense of safety

DDR never works alone. It only fully makes sense alongside Escaped per Release (article 5) and Number of Releases (article 6). When presenting DDR to stakeholders - always show it with at least one context metric.

How to roll out DDR in four steps

Enough theory. Here’s what to do in the coming week.

Define it and write it down

Answer three questions in writing: what counts as "pre-release bug" (all test environments? only staging?), what counts as "post-release bug" (only Jira? also monitoring and support?), what's the time window for "post-release" bugs (week? sprint? quarter?). Without this, DDR of two teams isn't comparable - even in the same organization.

Pick a data source

Ideally: Jira + monitoring (Datadog/PagerDuty) + support tickets. To start: Jira + a manual incident log in Google Sheets. Sounds primitive - it works. What matters is to start.

Set a measurement cadence

Per sprint - good start, fast feedback, lots of noise. Per release - more natural, better for trends and business reporting. Recommendation: per sprint internally, per release for stakeholders.

First presentation - start with the story

Don't start with Q1's DDR. Do a retroactive calculation for the last 3 quarters. A trend is a much stronger argument than a single point. *"Looking back at the last three quarters, our defect detection ratio looked like this: [chart]. The trend is rising - and I now want to settle how to keep improving it."*

DDR in conversation with the business

Three contexts. Three levels of detail. One indicator at the base of every conversation.

Sprint Review "This sprint's Defect Detection Ratio is 88% - that means 9 in 10 found issues were caught before reaching customers. One escaped and is already being addressed."

1:1 with EM "DDR trend over the last year is rising from 74% to 94%. Every percentage point is, in real terms, a few hours less on hotfixes. I want to propose a concrete change that should push it up another 3-4 points."

Board "Over the past four quarters we improved pre-production defect detection effectiveness from 74% to 94%. That translated into a 60%+ drop in escaped bugs - I estimate this as 200+ saved senior hours per year."

What DDR tells you - and what it doesn’t

✓ DDR tells you

How effective your testing process is as a whole
Whether you're improving over time (quarterly trend)
Where the line is between what you catch and what escapes
How to justify investment in automation or extra capacity

✗ DDR doesn't tell you

Whether the customer feels the improvement (without release count context)
Where in the system bugs are escaping
Whether the code reaching tests is good quality (that's Issues per Release)
How fast and efficient your process is (that's a different metric)

Use DDR as one of the five letters of the alphabet. Together they form a word. Alone - they're just letters.

In the next article

The third article in the series covers Escaped Bugs & Problems - and it starts with a question most QA teams ask too rarely: are we really measuring everything that escapes to production?

Spoiler: almost never. And what we leave out is often more important than what we count.

Series links

Series: QA metrics the business actually wants to hear

01

QA metrics the business actually wants to hear - the complete guide

Diagnosis, three pillars, five metrics, QA → KPI mapping model
02

Defect Detection Ratio - deep guide reading now

Formula, thresholds, historical data, seasonality, traps, ready lines
03

Escaped Bugs & Problems - full spectrum read

Taxonomy, data collection, the cost of each type, how to report
04

Issues per Release - a code-maturity gauge read

Rollout from scratch, the link to the development process, the EM conversation
05

Escaped Bugs per Release - find the risky release read

Spike detection, the investigation framework, preventive actions
06

Number of Releases - the context metric

Why 3 bugs with 2 releases is a disaster, and with 15 - a success
07

Release Confidence Score step by step

Three calculation models, rollout, concrete examples from practice
08

Storytelling with metrics - building a narrative

How to turn a table of numbers into a business argument
09

3 anti-patterns that destroy QA credibility

Too many metrics, no context, jargon - and how to avoid each

What DDR is - and what it isn’t

From simple to advanced formula

Basic version

Weighted version

DDR calculator

How to read the score - thresholds and context

Why you can’t start today - historical data

Where to find historical data

Seasonality and patterns

Minimum viable approach - how to collect the data practically

Case study - from 74% to 94% in four quarters

When DDR lies - three traps

DDR in business hands - the most dangerous trap

How to roll out DDR in four steps

DDR in conversation with the business

What DDR tells you - and what it doesn’t

In the next article

Series links

3 anti-patterns that destroy QA credibility

Storytelling with metrics - how to turn a table into an argument

Release Confidence Score step by step