Rare intermittent write failures in Tempo


Incident resolved in 147h7m41s

Resolved

This incident has been resolved.

1737673457

Update

Multiple issues were identified with our rollout process that caused instability on the write path. We have pushed out three fixes that drastically reduce the number of dropped writes and make all failed writes retryable. We will continue to closely monitor this issue and make improvements where necessary.

1737583907

Update

We are continuing to diagnose and improve this situation. Fixes have been rolled out on Monday January 20th, and today, Wednesday January 22nd, to reduce drops during rollouts.

1737575532

Update

The issue has been identified and a fix is being implemented.

1737159403

Investigating

We are continuing to investigate this issue.

1737143805

Investigating

Users may receive internet write failures on collectors stating: "Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/cloud", "error": "rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway)"

The agent/collector will automatically retry the write attempt and succeed, so there is no loss in data collection.

1737143796