Incident History

k6 test data processing severely degraded

This incident has been resolved.

1732052746 - 1732054779 Resolved

Write path outage in us-central1 region

Due to this bug reported in https://github.com/kubernetes/kubernetes/issues/127370, we were affected by an issue causing K8S service endpoints not getting updated when pods are stopped/started if there are more than 1k pods matching the service. This caused a temporary outage in Mimir gossiping services, which further resulted in failures to ingest and query metrics for a short time. This issue has been resolved.

1732036353 - 1732036353 Resolved

Issues with new stack creation

This incident has been resolved.

1731973594 - 1731982110 Resolved

Adaptive Metrics Degraded Performance

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

1731945686 - 1731970691 Resolved

Grafana Cloud Portal Accessibility Issues

This incident has been resolved.

1731926863 - 1731928352 Resolved

Degraded dashboard performance due to the erroneous security policy

Rollback has been completed as of 17:20 UTC. At this time, we are considering this issue resolved. No further updates.

1731673295 - 1731691382 Resolved

Tempo Ingestion Disruption

This incident has been resolved.

1731593509 - 1731595453 Resolved

k6 browser tests aborted by system

This incident has been resolved.

1731517136 - 1731525172 Resolved

Grafana Cloud Prometheus - Unhealthy Ingesters

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

1731009251 - 1731009810 Resolved

Confluent Cloud Latency and High Error Rates

Confluent Cloud has resolved the issue with their Metrics API, which was causing gaps in our metric data. As a result, our service is now fully restored, and data flow is back to normal. Thank you for your patience.

1730744600 - 1730798959 Resolved
⮜ Previous Next ⮞