Escaped Bugs and Problems - the full spectrum of what reaches production

Series: QA Leadership · Article 3 of 9

It was a Friday evening. The team's DDR was 91%. Regression passed beautifully. Confidence Score: 89% - GO. The release shipped. And forty minutes later the alerts started rolling in.

Friday · 18:47 · Production

18:47ALERT: Timeouts on connections to the external payments API - 503 for 34% of requests

18:51DevOps: checking the logs... this isn't our code. Something with the SSL config in the new environment.

19:03QA Lead: but it wasn't a bug - everything passed in the tests.

19:04PM: the client just wrote in. They haven't been able to process transactions for 17 minutes.

19:22Rollback complete. Downtime: 35 minutes. The production SSL certificate differed from staging.

The next day at the retrospective, one question came up: “How is this possible - DDR 91%, and the client couldn’t pay for half an hour?”

The answer is both simple and painful: because DDR measured only bugs in the code. And the problem was in the infrastructure configuration. And that is exactly the gap this article is about.

The client doesn't distinguish whether the service went down because of a code bug, a bad SSL certificate, or a wrong feature flag. To them - and to your business - it's all the same thing: production is down.

Escaped Problem - a broader definition

In the previous article we talked about DDR - the metric for defect detection effectiveness. DDR asks: how many bugs do we catch before they reach production? But that definition assumes the only problems are bugs in the application code.

Reality is different. An Escaped Problem is any problem discovered by a customer or by monitoring after deployment - regardless of its source. Four categories, four entirely different ways of arising, four different ways of preventing them.

Four types - one shared consequence

Before you start measuring, you need to know what you’re measuring. Here is the full taxonomy of escaped problems with the typical percentage share in the organizations I’ve worked with.

🐛

Code defects

The classic bug - incorrect application behavior caused by an error in the programming logic. This is exactly what DDR from article 2 measures.

Wrong price calculation after a discount NullPointerException on an edge case Incorrect form validation

⚙️

Infrastructure problems

The production environment behaves differently from the test one. The code is correct - but it doesn't work in the target context.

SSL certificate differs from staging Insufficient server resources under load Library version mismatch between environments

🔗

Integration failures

External APIs, third-party systems, internal microservices - something that worked in tests fails in production because of a different call context.

Payments API returns a different format in prod A timeout different from staging Missing permissions in a service integration

↩️

Post-deployment regressions

A feature worked before the release - after deployment it stopped. The cause: an unexpected interaction with new changes or configuration changes.

A feature flag overrode production settings Cache wasn't cleared after deployment A database migration changed the behavior of old records

The sum doesn’t add up to 100% - because a few percent are mixed situations, hard to classify cleanly. The proportions will differ in your organization - but the taxonomy itself is almost universal.

Code vs infra vs integration - the key differences

Each type of escaped problem has a different source, a different warning signal and a different prevention method. The table below is your navigation map.

Type	Who owns it	Where to look for signals	How to prevent it
Code	Dev + QA	Jira, automated tests, code review	Test coverage, DDR, definition of done
Infra	DevOps + QA	Monitoring, environment diffs, IaC review	Environment parity, infrastructure-as-code tests
Integrations	Dev + QA + vendor	API logs, contract tests, alerting	Contract tests, mocking with prod-like data
Regressions	QA + DevOps	Post-deployment monitoring, smoke tests	Post-deploy smoke suite, canary deployments

Distribution of escaped problem types - a sample year

Code dominates, but infra and integrations are ~35% of problems combined, often left out of reports

Q1-Q4

How to collect and categorize - a practical guide

Most teams collect only bugs from Jira. That’s like measuring the temperature in one room and claiming you know the climate of the whole building. Here’s what to add and how to connect it.

Data sources

🗂️

Jira / tracker

Code defects reported by QA and devs. An "environment" field or a "production" tag lets you filter out escaped ones.

mandatory

📡

Alert monitoring

PagerDuty, Datadog, Grafana. Production incidents with a timestamp - the source for infra and integrations.

mandatory

🎧

Support tickets

Freshdesk, Zendesk. Problems reported by customers that never reach Jira as a bug.

important

🔖

Post-deploy logs

The first 30 minutes after deployment is the regression window. Splunk, ELK, CloudWatch - logs from that window.

important

💬

Slack / Teams

The #incidents or #prod-issues channel. This is often where problems land before anyone logs them officially.

supplementary

The categorization process - step by step

Collect every production event from the week / sprint

One log - regardless of source. Date, short description, downtime or user impact. At this stage you don't categorize - you only collect.

Assign a type to each event

Code / infra / integration / regression. One event - one type. If you're not sure - pick the most likely one and mark it "to verify".

Map it to a release

Which deployment brought the problem in? Sometimes it's obvious - an incident 30 minutes after deployment. Sometimes you have to look at the change history. Without this step you lose the ability to tie escaped problems to specific releases (the metric from article 5).

Compute the cost and log the resolution time

Time to detect, time to fix, who was involved. Even an approximation (DevOps ~3h, Dev ~1h) is enough - the cost details we cover in the next section.

Implementation checklist

Check which data sources you already have connected in your team.

Jira - "environment" field or "production" tag configured

Lets you filter bugs found in production down to the release.

Monitoring alerts land in one place (Slack / PagerDuty)

Every production alert should leave a trace you can analyze later.

Support tickets linked to Jira or logged separately

Without this you lose problems the customer reports directly - often the most serious ones.

Deployment history with exact dates and times

Essential for attributing incidents to specific releases.

Smoke tests run automatically after every deployment

They catch regressions in the first minutes - before they reach the customer.

A weekly incident review with type classification

A 15-minute ritual that turns raw data into a categorized history.

Cost

How much does each escaped problem type cost?

Each type of escaped problem has a different cost profile - a different detection time, a different fix time, different people involved. Below are estimates based on the median from typical enterprise organizations. Your numbers will differ - but the proportions are surprisingly consistent.

Code

Application code defect

Dev: 2-3h analysis + fix QA: 1h verification DevOps: 1h hotfix deploy PM: 0.5h coordination

The most common type. A well-defined fix process. Lower escalation cost.

5-6h

per incident

risk: medium

Infra

Infrastructure / configuration problem

DevOps: 3-5h diagnosis + fix Dev: 1h support QA: 1h environment verification PM: 1h + client communication Often: a rollback of the whole release

Harder to diagnose. Often requires a rollback - not just a fix.

8-12h

per incident

risk: high

Integration

External integration failure

Dev: 2-4h diagnosis + workaround DevOps: 2h configuration PM: 2-3h vendor communication Often: SLA breach with an external vendor

Part of the problem sits with the vendor. Resolution time depends on an external SLA.

8-16h

per incident

risk: critical

Regression

Post-deployment regression

QA: 2h scope identification Dev: 2-3h interaction analysis DevOps: 2h rollback or hotfix Often: impact on several features at once

Insidious - because "the previous version worked". Requires deeper root-cause analysis.

7-10h

per incident

risk: high

Average cost across all types

~8h

per single escaped problem

Most expensive type

Integration

8-16h + external SLA

Most common type

Code

~55% of all cases

Data that says more than a single counter

Instead of one number “escaped bugs = 12” - two charts that give a completely different level of insight into what’s really going on.

Escaped problems by type - quarterly trend

Code shrinks faster - because it's better tested. Infra and integrations hold steady - they need different actions.

Q1-Q4 2025

Code Infra Integrations Regressions

Cost by type - Q4 2025

Integrations are only 15% of cases - but they consume disproportionately more time and budget

work hours

How to present this to the business

The number of escaped problems alone stops being enough once you have the type distribution and the cost of each. Here’s how to turn that data into a narrative.

Instead of: *"we had 8 escaped bugs."* Say: *"we had 8 escaped problems - 5 code defects, 2 configuration problems and 1 integration failure. Total cost: about 68 hours. Infra and integrations need a separate strategy."*

Sprint review "This sprint we had 3 escaped problems: 2 code defects and 1 environment configuration problem. Cost: about 22 hours. The configuration problem was the most expensive - and we have a plan to not repeat it."

1:1 with EM "Looking at the trend - code defects are dropping. But infra and integration problems hold at a steady level. That needs a different intervention than more testing - we need better environment parity and contract tests."

Board "In Q4 we had 8 escaped problems at a total cost of about 68 work hours. For comparison - in Q1 there were 18 at about 160 hours. The biggest saving came from contract tests rolled out in Q2."

What the full taxonomy changes

Once you start categorizing escaped problems instead of just counting them - the conversation changes fundamentally. You stop saying how many and start saying what and why.

types of escaped problems to track

5×

cost difference: code vs integration

35%

problems missed when you measure only bugs in the code

15min

a weekly review is enough for full categorization

The client doesn't report a problem labeled "type: infrastructure". To them - and to your business - one thing matters: does it work. Measure everything that can stop working.

In the next article

Article four covers Issues per Release - a code-maturity metric that reshapes the conversation with the Engineering Manager. It doesn’t ask how many bugs you found - it asks how clean the code you received for testing was.

Spoiler: this is the metric that often reveals the problem lies not with QA but with the development process - and it gives you the data to have that conversation from a position of facts, not opinions.

Series: QA metrics the business wants to hear

01

The complete guide read

Diagnosis, three pillars, five metrics, the QA → KPI mapping model
02

Defect Detection Ratio read

Formula, thresholds, historical data, seasonality, pitfalls
03

Escaped Bugs & Problems you are here

Taxonomy, data collection, the cost of each type, how to report
04

Issues per Release - a code-maturity gauge read

How this metric reshapes the conversation with the Engineering Manager
05

Escaped Bugs per Release - find the risky release read

Pinpointing problems, not just watching trends
06

Number of Releases - the context metric

Why 3 bugs with 2 releases is a disaster, and with 15 - a success
07

Release Confidence Score step by step

Three calculation models, rollout, concrete examples from practice
08

Storytelling with metrics - building a narrative

How to turn a table of numbers into a business argument
09

3 anti-patterns that destroy QA credibility

Too many metrics, no context, jargon - and how to avoid each

Escaped Problem - a broader definition

Four types - one shared consequence

Code vs infra vs integration - the key differences

How to collect and categorize - a practical guide

Data sources

The categorization process - step by step

Implementation checklist

How much does each escaped problem type cost?

Data that says more than a single counter

How to present this to the business

What the full taxonomy changes

In the next article

3 anti-patterns that destroy QA credibility

Storytelling with metrics - how to turn a table into an argument

Release Confidence Score step by step