Design Practice

Telemetry
as Code

Telemetry should be designed with the same care as the behaviour it describes.

Operational telemetry is often treated as something added after the real work is done. A job is written, a failure happens, and only then do people add logs, counters, or dashboard fields. That makes visibility reactive. The system learns to explain itself only after it has already been confusing.

Telemetry as code means making those signals part of the implementation instead of a separate cleanup pass. When a batch job changes phases, that transition belongs in the code. When a worker skips an item, records a warning, or reaches a milestone, that fact should be emitted where the decision happens.

Signals are part of behaviour

A process does more than produce final output. It validates input, makes choices, handles exceptions, retries, skips, waits, and completes. Those actions are operational behaviour. If they matter to the people responsible for the system, they deserve first-class representation in the code.

Keep instrumentation close

The best telemetry is close to the logic it describes. A progress event near the loop, a milestone near the phase boundary, or a note near a fallback path is easier to keep correct than a separate observer trying to infer meaning from the outside.

Review the explanation

Code review should ask whether the system explains its important work. If a change adds a new state, branch, or failure mode, it may also need a new event. That does not mean instrumenting everything. It means making the important parts visible by design.

The application defines its own telemetry

OpenTrace is deliberately code-first. The application source code is the authoritative definition of its telemetry because the application already knows the work it performs, which metrics matter, how progress is measured, what milestones exist, what operations can safely be performed, and what success looks like.

The dashboard should not be a second system where developers recreate that knowledge by hand. The application describes what it knows, and OpenTrace renders it. In that sense, the dashboard is a projection of the application, not a separately maintained configuration.

Less configuration drift

Traditional monitoring setups often spread operational meaning across application code, dashboards, alert rules, documentation, and runbooks. Each layer may be correct on the day it is written, but they can drift apart as the system changes.

A code-first approach keeps telemetry close to the behaviour it describes. If a metric changes, its definition changes in the same commit as the code that produces it. If a workflow gains a new phase, the milestone can be added where that phase is implemented. The operational view evolves with the application instead of chasing it later.

Version control becomes telemetry history

When telemetry lives in source code, it gets the same lifecycle as the software itself. Code reviews include telemetry changes. Pull requests show operational changes. Git history explains why metrics evolved. Reverting telemetry is simply reverting code.

That matters because telemetry is part of the design. A new progress event, expectation, or diagnostic payload is not just a dashboard tweak. It is a decision about how the system should explain itself to the people operating it.

Self-documenting systems

Developers should be able to search the codebase to discover where a metric originates, how it is calculated, and why it exists. They should not have to hunt through dashboard configuration or a separate portal to understand the connection between implementation and operational view.

This works especially well for batch processing, ETL pipelines, scheduled jobs, automation, integrations, data engineering, and long-running workflows. In those systems, the code often has the clearest understanding of progress, expectations, milestones, and meaningful notes.

OpenTrace renders what already exists

OpenTrace is opinionated about this boundary. The application defines metrics, progress, milestones, expectations, skills, and notes. OpenTrace renders dashboards, charts, timelines, incident views, progress displays, and investigation tools from what the application declares.

That keeps adoption small. A script, batch job, or service can become observable by making OpenTrace SDK calls or plain HTTP calls where the work happens. There is no requirement to design dashboards first or configure visualisations before the system can say something useful.

The application is the source of truth. OpenTrace does not ask developers to recreate their system inside a dashboard. The application tells OpenTrace what it knows, and OpenTrace turns that into something operators can understand.

Small contracts scale

OpenTrace keeps the contract small so this habit is easy to adopt: send events, metrics, notes, payloads, durations, and milestones over plain HTTP. The goal is not a giant observability framework. The goal is code that reports what operators will otherwise have to ask about later.