Incident History

New Pods Failing to Start for a Small Subset of Instances

At approximately 13:50 UTC a new deployment resulted in pointing to a non-existent plugin version for pinned instances. This resulted in new pods for those instances to fail upon start. Since older pods were still running, users should not have experienced a disruption.

1752787343 - 1752787343 Resolved

IRM (OnCall + Incident) plugin not available

This incident has been resolved.

1752581421 - 1752605550 Resolved

Degraded performance on Email notifications across all regions

Hosted Grafana experienced degraded performance on the Email notification service today, 15th of July from 11:45 UTC to 13:00 UTC. This impacted the sending/receiving of emails used in modules like alerting or reporting. Service was restored shortly after identification.

1752578446 - 1752578446 Resolved

Most Prometheus Queries Failing in Prod-US-East-0

This incident has been resolved.

1752155527 - 1752161785 Resolved

Grafana fast release channel error

This incident has been resolved.

1751633050 - 1751874595 Resolved

Hosted Grafana Outage (prod-us-west-0)

Engineering has released a fix and as of 22:20 UTC, customers should no longer experience any issues. At this time, we are considering this issue resolved. No further updates.

1751581046 - 1751581696 Resolved

Read and write path outage in Hosted Logs ap-south-1 region cells.

The incident was caused by multiple ingesters being unavailable at the same time due to moving ingester pods between nodes. It's a regular operation, but in this particular case the ingester took an unexpected long time to restart which coincided with another ingester eventually restarting at the same time, causing an issue.

1751534633 - 1751553088 Resolved

cortex-prod-05 cell partial read-path outage

This incident has been resolved.

1751534309 - 1751577871 Resolved

Degraded Performance in Mimir Ingestion Path

This incident has been resolved.

1751308063 - 1751316773 Resolved

Read & Write Outage in Prod-GB-South-1

We observed a brief read & write outage in the prod-gb-south-1 region. This lasted from approximately 11:51-12:14 UTC.

1751038907 - 1751038907 Resolved
⮜ Previous Next ⮞