Incident History

Degraded Performance in Mimir Ingestion Path

This incident has been resolved.

Read & Write Outage in Prod-GB-South-1

We observed a brief read & write outage in the prod-gb-south-1 region. This lasted from approximately 11:51-12:14 UTC.

Synthetic Monitoring: Spain public probe failing intermittently.

Spain public probe was facing issues and had intermittent failures between June 21st 20:00 UTC to June 22nd 8:40 UTC. Synthetic monitoring checks using Spain public probe have been failing intermittently during that window. The issue is now resolved - the probe doesn't suffer any issues anymore and is stable.

Degraded Write Performance on Tempo Metrics Generator

This incident has been resolved.

High query latency for us-prod-east-0 hosted datasources

This incident has been resolved. No further issues were seen since adjusting the backend configuration on Friday June 20th. (22:22 UTC)

The root cause has been identified as node CPU saturation, causing high latency on ingesters.

Cloudwatch/Athena Integrations - Partial Outage

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

Slow user queries exceed threshold

There was a read outage impacting Loki tenants in the prod-us-east-0 cluster. This issue has been resolved.

Metrics Drilldown Issues

This incident has been resolved.

Logs query rate metric unavailable

Between 9:40 to 10:55 AM UTC, Cloud Logs service briefly experienced issue with providing data to query rate metrics only. You may experience gaps in the results for query rates panel in the billing dashboards for the given period. The situation is now mitigated, we apologize for the inconvenience.

Brief Write Latency in prod-us-west-0 Loki Cell

From 18:15 to 18:25 UTC, our prod-us-west-0 Loki cell experienced a period of degraded write performance. The issue resolved quickly without requiring manual intervention, and the system has remained stable since.

⮜ Previous