Incident History

Grafana K6 Tests Not Starting

Test runs are now working as expected. The duration of this incident was roughly 30 minutes. The test runs were not able to start, and the app and the API were not accessible.

Synthetic Monitoring Checks North Virginia Probe Down In prod-us-east-0

The incident is resolved as of 14:35 UTC.

Loki-Managed Rules Failed For Some Tenants

From Oct 31st 17:40 UTC to Nov 3rd 14:50 UTC:

Due to some internal auth issues, the components evaluating loki-managed rules failed to push the evaluated recording and alert rules to the metrics endpoint for some tenants.

No services are reachable in prod-us-central-7

This incident has been resolved.

Temporary Trace Ingestion Errors

From approximately 16:30-8:15 UTC, a configuration change inadvertently removed a required headless service for hosted traces in one of our production regions. This caused elevated error rates and increased service-level objective (SLO) burn for the trace ingestion pathway. The underlying issue was a mismatch in internal configuration references following a prior migration. Re-enabling the headless service restored normal operation.

Tempo - Metrics Generator Outage

The issue affecting Tempo (prod-us-east-0 & prod-us-west-0) and Loki (prod-us-central-5) has been fully resolved.

Metrics generation is now operating normally across all regions, and we continue to monitor for stability.

Grafana Cloud Loki Parallel Read path Issue in prod-ap-southeast-1

This incident has been resolved.

Grafana Cloud Hosted Logs increased write latency in eu-west-3 region.

We consider the incident as resolved now. With regards to the cause - a slow physical partition of the backend database, which is used by the control plane of a critical component caused increased latency and occasional overloading with subsequent failing of the write path. Once writes switched to a different partition, the latency dropped and error rate went down.

Some Metric Series May Be Incorrectly Processed

This incident has been resolved.

Metrics read and write issue affecting prod-ap-southeast-1

This incident has been resolved.

⮜ Previous Next ⮞