Series: QA Leadership · Article 2 of 9

You walk into a 1:1 with your Engineering Manager. One question lands: "How many of those bugs do you catch before they reach the customer?" - and the conversation starts to fall apart. Not because you test badly. Because you don't have one single number.

EM"Listen - how effective are you actually at this testing? How many of those bugs do you catch before they reach the customer?"
QA"Well... we found 47 bugs this sprint."
EM"Yes, but how many escaped?"
QA"Uh... twenty two."
EM"So half of them bypass you?"
QA"Well... not exactly, because those were smaller..."

And the conversation falls apart. Not because you test badly. Because you don’t have a number that answers that question directly. One. Concrete. Ready.

That number exists. It’s called Defect Detection Ratio - and it’s the topic of this article.

What DDR is - and what it isn’t

Defect Detection Ratio is the share of defects caught by QA before reaching production, against all defects found in total - both before and after release. In other words: out of all the problems that ultimately surfaced - how many did you catch yourselves, before the customer saw them?

This is a metric of testing process effectiveness. Not activity. It answers the question: how well do we work as a filter before production?

DDR asks something fundamentally different from pass rate or coverage: does your testing process actually catch what matters?

DDR is not the same as pass rate. You can have a 99% pass rate and a 50% DDR - if your tests don’t cover the areas where the bugs live.

DDR is not the same as coverage. You can hit 90% of the code and not check a single critical business scenario. Touching is not the same as verifying.

From simple to advanced formula

Basic version

Basic formula
DDR = Pre-release bugs ÷ (Pre-release + Post-release)
DDR = 40 ÷ (40 + 10) = 40 ÷ 50 = 80%
Eight out of ten problems caught before the customer. One out of five escaped. That's your starting point.

Weighted version

The basic formula treats every bug equally. But a payments-blocking bug weighs more than a typo in a tooltip. It’s worth extending the formula with weights.

Weighted formula
DDR(weighted) = Σ(weight × bugs_pre) ÷ Σ(weight × bugs_pre + weight × bugs_post)
Start with the basic version. Introduce weights once you have a stable measurement rhythm and historical data.
PriorityPrePostWeightWeighted preWeighted post
Critical23×4812
High85×21610
Medium202×1202
Low100×0.550
Sum4010-4924
DDR(weighted) = 49 ÷ (49 + 24) = 49 ÷ 73 = 67%
Look at the result. The basic DDR was 80% - and it looked good. The weighted one is 67% - and it reveals that most critical bugs were escaping to production. That's a completely different story. And that's the story worth telling.

DDR calculator

Enter your numbers and check the result. Toggle weighted mode to factor in bug criticality.

Calculate your DDR
Basic or weighted - your choice
Weighted mode (factor in bug priorities)
Critical ×4
pre
post
High ×2
pre
post
Medium ×1
pre
post
Low ×0.5
pre
post
80%
Defect Detection Ratio
Solid process - what's hiding in the few that escape?
40 ÷ (40 + 10) = 80.0%

How to read the score - thresholds and context

DDR is not absolute truth. It’s an indicator - and like every indicator, it requires interpretation. But certain industry thresholds are worth knowing as a reference point.

0%
70%alarm threshold
85%good threshold
95%excellent threshold
100%
Below 70%
Alarm signal. More than 3 in 10 bugs reach production. Investigate causes.
70-85%
Average level. A good starting point. There's room to grow.
85-95%
Solid process. The question: what's hiding in those few percent that escape?
Above 95%
Excellent score - but check if the data is complete. High DDR can be an artifact of incomplete data.

Industry context matters. In financial and medical systems 90%+ is a minimum, not an aspiration. In a fast-iterating startup, 80% at high release frequency may be a conscious, acceptable tradeoff.

Why you can’t start today - historical data

One of the most common mistakes when rolling out DDR: the team starts measuring from the current sprint and after a month has one data point. One. From which no conclusion can be drawn.

DDR without history is like a map without a scale. You know you're somewhere - but you don't know which direction you're heading and how fast.

Before you start measuring “from now”, do something much more valuable: reconstruct data backwards. Most organizations have all the data they need - nobody has just connected it in this specific way yet.

Where to find historical data

Jira / tracker
Bug history with date and environment. Export to CSV + JQL by date and type.
Main source
Support tickets
Freshdesk, Zendesk, ServiceNow. Here live the problems that never made it to Jira.
Supplement
Monitoring / alerts
PagerDuty, Datadog, Grafana. Incidents with exact timestamps.
Supplement
Deployment history
Git tags, CI/CD pipeline, changelog. When each release shipped.
Context
project = MYAPP AND issuetype = Bug AND created >= "2025-01-01"
ORDER BY created ASC

Seasonality and patterns

With 12 months of data, you start seeing patterns your intuition won’t catch.

Release seasonality
Release peaks before Q4, Black Friday, year-end. Knowing the rhythm - you plan testing capacity ahead, not putting out fires.
Turnover and onboarding
A new QA catches fewer issues than a senior for the first two months. Without data you don't know if a DDR drop is a process problem or an onboarding effect.
Feature type
New integrations, big refactors, new modules - DDR drops with specific change types. You can predict and direct testing effort.
First-release pattern
The first deployment of the month statistically has more escaped bugs. Accumulated changes + production drift from the test state.

Minimum viable approach - how to collect the data practically

1
Export bugs from Jira to CSV
You need: ID, created date, environment (test/staging/prod), priority. JQL above + export.
2
Build a release table
Date + version number. If you don't have it collected - git tags or CI/CD history will give it to you.
3
Assign each bug to a release
Bug created between release A and B → pre-release for B. Bug after B and before C, reported through monitoring → escaped from B.
4
Calculate DDR per release and draw the chart
4-6 releases in one table. You have history, trend, and first patterns. This exercise takes 2-4 hours. Worth every minute.

Case study - from 74% to 94% in four quarters

A seven-person team (5 devs, QA, automation engineer), SaaS platform for enterprise customers. At the start of the year DDR 74% - three in ten bugs reach production.

DDR - trend across four quarters
Each quarter: one concrete process change
+20 pp. in a year
DDR (%) Process change
Q1
Starting point
Diagnosis - before changing anything, they measured 74%
An analysis of 6 months of history surfaced 3 clusters of escapees: payments API integration, reporting module edge cases, post-deployment configuration errors. Unit tests - beautiful. But none touched those areas.
Q2
First intervention
Contract tests + E2E expansion 84%
Contract tests rolled out for the payments API, E2E expanded with reporting module scenarios. A 10pp jump in one quarter - just from knowing where they weren't testing.
Q3
Process change
New definition of "done" 90%
No feature enters QA without a minimal set of integration tests written by the developer. QA stopped being the gatekeeper at the end - it became a partner throughout the sprint.
Q4
Full picture
Monitoring incidents counted in the denominator 94%
A seemingly small change - support and monitoring incidents added to "post-release bugs". The number went up, but DDR held - because pre-release was growing in parallel. Now they had a full, credible picture.
"Every 5 percentage points of DDR is on average 4 fewer escaped bugs per quarter, at 8 hours each - that's 32 senior hours. Per quarter." - Budget approved.

When DDR lies - three traps

Every metric has weaknesses. DDR has three specific ones - and it’s worth knowing them before you start trusting it blindly.

Incomplete "post-release" definition
If the counter only includes Jira tickets marked by QA - you underestimate escaped defects. What about support incidents? Monitoring alerts? Splunk errors? Incomplete denominator = inflated DDR = false excellence.
Code bugs ≠ all problems
Bad production config. A broken integration. A wrong feature flag. None of them is a "code bug" - but each one hit customers. If DDR measures only code defects - you're not measuring the whole risk. (More in article 3: Escaped Bugs & Problems.)
High DDR, but only on trivial bugs
You can have DDR 95% and regularly ship critical bugs - if your tests are great at catching typos but weak on critical business paths. That's why you should always pair DDR with priority distribution. If your 95% is mostly Medium and Low - go back to the weighted formula.

DDR in business hands - the most dangerous trap

You won’t find this trap in the ISTQB syllabus. And it’s the most dangerous, because it touches not the measurement method but how it’s interpreted by people who don’t know the context.

Picture this: you show your Product Owner DDR 94%. They’re happy. They say: “great, we’re safe, we’re shipping.” But they don’t know that in the same quarter the number of releases went from 3 to 10.

QuarterDDRReleasesEscaped / ReleaseEscaped total
Q188%32.47
Q290%52.110
Q392%81.814
Q494%101.212

DDR rises across all four quarters. Looks great. But the absolute count of escaped bugs grew through Q1-Q3. For three quarters the customer experienced more problems in production - despite rising DDR.

⚠️ High DDR without context gives a false sense of safety

DDR never works alone. It only fully makes sense alongside Escaped per Release (article 5) and Number of Releases (article 6). When presenting DDR to stakeholders - always show it with at least one context metric.

How to roll out DDR in four steps

Enough theory. Here’s what to do in the coming week.

1
Define it and write it down
Answer three questions in writing: what counts as "pre-release bug" (all test environments? only staging?), what counts as "post-release bug" (only Jira? also monitoring and support?), what's the time window for "post-release" bugs (week? sprint? quarter?). Without this, DDR of two teams isn't comparable - even in the same organization.
2
Pick a data source
Ideally: Jira + monitoring (Datadog/PagerDuty) + support tickets. To start: Jira + a manual incident log in Google Sheets. Sounds primitive - it works. What matters is to start.
3
Set a measurement cadence
Per sprint - good start, fast feedback, lots of noise. Per release - more natural, better for trends and business reporting. Recommendation: per sprint internally, per release for stakeholders.
4
First presentation - start with the story
Don't start with Q1's DDR. Do a retroactive calculation for the last 3 quarters. A trend is a much stronger argument than a single point. *"Looking back at the last three quarters, our defect detection ratio looked like this: [chart]. The trend is rising - and I now want to settle how to keep improving it."*

DDR in conversation with the business

Three contexts. Three levels of detail. One indicator at the base of every conversation.

Sprint Review "This sprint's Defect Detection Ratio is 88% - that means 9 in 10 found issues were caught before reaching customers. One escaped and is already being addressed."
1:1 with EM "DDR trend over the last year is rising from 74% to 94%. Every percentage point is, in real terms, a few hours less on hotfixes. I want to propose a concrete change that should push it up another 3-4 points."
Board "Over the past four quarters we improved pre-production defect detection effectiveness from 74% to 94%. That translated into a 60%+ drop in escaped bugs - I estimate this as 200+ saved senior hours per year."

What DDR tells you - and what it doesn’t

✓ DDR tells you
  • How effective your testing process is as a whole
  • Whether you're improving over time (quarterly trend)
  • Where the line is between what you catch and what escapes
  • How to justify investment in automation or extra capacity
✗ DDR doesn't tell you
  • Whether the customer feels the improvement (without release count context)
  • Where in the system bugs are escaping
  • Whether the code reaching tests is good quality (that's Issues per Release)
  • How fast and efficient your process is (that's a different metric)
Use DDR as one of the five letters of the alphabet. Together they form a word. Alone - they're just letters.

In the next article

The third article in the series covers Escaped Bugs & Problems - and it starts with a question most QA teams ask too rarely: are we really measuring everything that escapes to production?

Spoiler: almost never. And what we leave out is often more important than what we count.

Series: QA metrics the business actually wants to hear
  • 01
    Diagnosis, three pillars, five metrics, QA → KPI mapping model
  • 02
    Defect Detection Ratio - deep guide reading now
    Formula, thresholds, historical data, seasonality, traps, ready lines
  • 03
    Taxonomy, data collection, the cost of each type, how to report
  • 04
    Rollout from scratch, the link to the development process, the EM conversation
  • 05
    Spike detection, the investigation framework, preventive actions
  • 06
    Number of Releases - the context metric
    Why 3 bugs with 2 releases is a disaster, and with 15 - a success
  • 07
    Release Confidence Score step by step
    Three calculation models, rollout, concrete examples from practice
  • 08
    Storytelling with metrics - building a narrative
    How to turn a table of numbers into a business argument
  • 09
    3 anti-patterns that destroy QA credibility
    Too many metrics, no context, jargon - and how to avoid each