No services are reachable in prod-us-central-7
This incident has been resolved.
This incident has been resolved.
From approximately 16:30-8:15 UTC, a configuration change inadvertently removed a required headless service for hosted traces in one of our production regions. This caused elevated error rates and increased service-level objective (SLO) burn for the trace ingestion pathway. The underlying issue was a mismatch in internal configuration references following a prior migration. Re-enabling the headless service restored normal operation.
The issue affecting Tempo (prod-us-east-0 & prod-us-west-0) and Loki (prod-us-central-5) has been fully resolved.
Metrics generation is now operating normally across all regions, and we continue to monitor for stability.
This incident has been resolved.
We consider the incident as resolved now. With regards to the cause - a slow physical partition of the backend database, which is used by the control plane of a critical component caused increased latency and occasional overloading with subsequent failing of the write path. Once writes switched to a different partition, the latency dropped and error rate went down.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.