Incident History

Metrics Drilldown Issues

This incident has been resolved.

1750256189 - 1750270467 Resolved

Logs query rate metric unavailable

Between 9:40 to 10:55 AM UTC, Cloud Logs service briefly experienced issue with providing data to query rate metrics only. You may experience gaps in the results for query rates panel in the billing dashboards for the given period. The situation is now mitigated, we apologize for the inconvenience.

1750247145 - 1750247145 Resolved

Brief Write Latency in prod-us-west-0 Loki Cell

From 18:15 to 18:25 UTC, our prod-us-west-0 Loki cell experienced a period of degraded write performance. The issue resolved quickly without requiring manual intervention, and the system has remained stable since.

1750186332 - 1750186332 Resolved

Ingestion errors for Traces on cluster AWS Germany ( prod-eu-west-2)

The Tempo service on cluster EU west experienced a traffic increase over the weekend, which caused an elevated error rate in Tempo's write path (ingestion). Our engineering team identified the root cause of the issue, and implemented measurements for palliating and resolving the problem.

Traces ingestion problems could have been experienced from 15:30 UTC on 13th until 19:30 UTC on 15th.

1750066626 - 1750066626 Resolved

Major GCP Incident Affecting Multiple Grafana Cloud Components (Including AWS and Azure deployed Instances)

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

1749753396 - 1749765102 Resolved

Loki - Slow Queries

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

1749673936 - 1749679047 Resolved

Instances on the "Slow" Release Channel Receiving Unexpected Errors.

At approximately 12:00 UTC a feature toggle was rolled out which negatively impacted instances on the slow release channel. Users on this release channel began to receive an "AlertStatesDataLayer" error. A workaround was quickly identified and applied to reporting users. The feature toggle in question was fully reverted by 18:00 UTC.

1749074414 - 1749074414 Resolved

Private Datasource Connect - New Agents Failing To Get SSH Certificates Signed

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

1748964192 - 1748967164 Resolved

Scheduled maintenance caused a temporary issue with logging in into Grafana Cloud stacks.

Due to scheduled maintenance (https://status.grafana.com/incidents/rz7nt6cs4prb) we hit an issue with some users being unable to log in into their Grafana Cloud stacks. The issue was affecting only users who:

1748951344 - 1748951344 Resolved

Elevated latency in prod-us-east-0 cluster.

This incident has been resolved.

1748876166 - 1748887123 Resolved
⮜ Previous Next ⮞