Synthetic monitoring secrets - proxy URL changes
The incident is resolved. We are in contact with customers affected by this change.
The incident is resolved. We are in contact with customers affected by this change.
We consider this incident as resolved since the latency hasn't been elevated since the fix was applied. The issue was caused by a latency spike in a downstream dependency, causing an increased backpressure on the Hosted Traces ingestion path, which degraded gateway performance and resulted in an elevated write latency. After clearing the affected gateway services the degraded state went away and normal operation was restored.
Impact: Between 14:30 and 14:38 UTC, some customers in prod-eu-west-2 may have experienced issues querying metrics. During this time, read requests to the metrics backend were unavailable, resulting in failed or incomplete query responses. The root cause of the issue was identified and addressed.
Resolution: The affected components were restored, and service was fully available by 14:38 UTC. We have taken additional steps to prevent this type of disruption from occurring in the future.
Next Steps: We are reviewing monitoring and safeguards around this failure mode to further improve reliability.
This incident has been resolved.
This incident has been resolved.
This incident has been resolved.
Both read and write 5xx's and increased latency were experienced in the two periods: 23:56:15 to 00:32:45 UTC 00:55:30 to 01:36:15 UTC
The scope of this incident was smaller than originally anticipated.
As of 16:27 UTC our engineering team merged a fix for those affected and we are considering this as resolved.
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
Engineering has released a fix and we continue to observe a period of recovery. As of 15:12 UTC we are considering this resolved.
Between 20:23 UTC and 20:53 UTC, Grafana Cloud Logs in prod-us-east-3 experienced a write degradation, which may have resulted in delayed or failed log ingestion for some customers.
The issue has been fully resolved, and the cell is currently operating normally. We are continuing to investigate the root cause and will provide additional details if relevant.