Incident History

Grafana Cloud instances unavailable

This incident has been resolved.

Increased write error rate for logs in prod-us-west-0

We were experiencing increased write error rate for logs in prod-us-west-0 from 6:55 to 7:15 UTC. We have since observed continued stability and are marking this as resolved.

Upgrade from Free → Pro failing for users

Engineering has released a fix and as of 00:13 UTC, customers should no longer experience issues upgrading from Free to Pro subscriptions. At this time, we are considering this issue resolved. No further updates.

Investigating Issues with Email Delivery

This incident has been resolved.

Synthetic monitoring secrets - proxy URL changes

The incident is resolved. We are in contact with customers affected by this change.

Hosted Traces elevated write latency in prod-us-central-0 region.

We consider this incident as resolved since the latency hasn't been elevated since the fix was applied. The issue was caused by a latency spike in a downstream dependency, causing an increased backpressure on the Hosted Traces ingestion path, which degraded gateway performance and resulted in an elevated write latency. After clearing the affected gateway services the degraded state went away and normal operation was restored.

Incident: Metrics Querying Unavailable in EU (Resolved)

Impact: Between 14:30 and 14:38 UTC, some customers in prod-eu-west-2 may have experienced issues querying metrics. During this time, read requests to the metrics backend were unavailable, resulting in failed or incomplete query responses. The root cause of the issue was identified and addressed.

Resolution: The affected components were restored, and service was fully available by 14:38 UTC. We have taken additional steps to prevent this type of disruption from occurring in the future.

Next Steps: We are reviewing monitoring and safeguards around this failure mode to further improve reliability.

⮜ Previous Next ⮞