Detection Window

The Time Between
Broken and Known

This interval determines how much damage a failure can do before the team even starts repairing it.

The moment a system breaks and the moment a team knows it broke are different events. The space between them is often invisible in dashboards and postmortems, but it is where impact grows. Data gets older, queues get deeper, customers repeat actions, and teams keep trusting outputs that may no longer be valid.

Reducing repair time is valuable, but repair cannot start before discovery. If a failure stays hidden for six hours and takes ten minutes to fix, the dominant reliability problem was not repair speed.

Instrument the interval

OpenTrace gives operational processes a timeline. That timeline makes it easier to see when a process last reported, what phase it reached, and whether the expected next event never arrived. Broken-to-known becomes measurable.