Scenario

A Day in the Life
of a Batch Job

The most expensive failures are often the ones that look quiet from the outside.

On day one, the job is simple. Every night it imports the latest product data, matches it to existing records, applies promotions, and prepares a report for the sales team. It runs after midnight, finishes before anyone starts work, and only gets attention when someone changes the code.

By day seven, it has become part of the business routine. Sales expects fresh data in the morning. Operations assumes the schedule is working. The infrastructure dashboard is green. CPU is normal, memory is stable, and the server is reachable.

Then the supplier changes a file format. The job starts, reads the first file, skips more rows than usual, and exits early. It does not crash loudly. It writes a few log lines, leaves yesterday's report in place, and disappears until the next scheduled run.

Everything looks healthy

The machine is still fine. The database is up. The queue is not backed up. No exception crosses the threshold that would wake anyone. From an infrastructure point of view, there is not much to see.

The business sees something different. A sales manager notices stale numbers. A customer asks why a promotion is missing. Someone posts in Teams: "Has the import run today?"

The search begins

Operations checks the scheduler. Engineering checks the logs. Someone asks whether the supplier sent the file. Someone else asks whether the data warehouse updated. The answer is not impossible to find, but it is scattered across systems that were not designed to explain the process as a process.

The useful questions are straightforward: did the job start, which file did it read, how many products matched, how many rows were skipped, which milestone failed, and what output was generated? Those are business telemetry questions.

The missing layer

If the job had reported its own progress, the conversation would be shorter. It could have emitted a milestone when the import started, a metric for rows processed, a note when the supplier format looked different, a count of skipped records, and a final status showing that no customer-ready report was produced.

That does not replace logs or infrastructure monitoring. It gives people a live, domain-level view of the work they care about. Logs can still explain the details. Infrastructure monitoring can still explain the platform. Business telemetry explains what happened to the operation.

Where OpenTrace fits

OpenTrace is designed for this kind of visibility. A batch job can report progress, metrics, notes, payloads, durations, and milestones over a small HTTP contract. The point is not to instrument everything. The point is to make the questions people ask in chat visible before they have to ask.

Everything looks healthy

The search begins

The missing layer

Where OpenTrace fits

Related