Hosted Grafana Outage (prod-us-west-0)
Engineering has released a fix and as of 22:20 UTC, customers should no longer experience any issues. At this time, we are considering this issue resolved. No further updates.
Engineering has released a fix and as of 22:20 UTC, customers should no longer experience any issues. At this time, we are considering this issue resolved. No further updates.
The incident was caused by multiple ingesters being unavailable at the same time due to moving ingester pods between nodes. It's a regular operation, but in this particular case the ingester took an unexpected long time to restart which coincided with another ingester eventually restarting at the same time, causing an issue.
This incident has been resolved.
This incident has been resolved.
We observed a brief read & write outage in the prod-gb-south-1 region. This lasted from approximately 11:51-12:14 UTC.
Spain public probe was facing issues and had intermittent failures between June 21st 20:00 UTC to June 22nd 8:40 UTC. Synthetic monitoring checks using Spain public probe have been failing intermittently during that window. The issue is now resolved - the probe doesn't suffer any issues anymore and is stable.
This incident has been resolved.
This incident has been resolved. No further issues were seen since adjusting the backend configuration on Friday June 20th. (22:22 UTC)
The root cause has been identified as node CPU saturation, causing high latency on ingesters.
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
There was a read outage impacting Loki tenants in the prod-us-east-0 cluster. This issue has been resolved.