Incident History

Ingestion errors for Traces on cluster AWS Germany ( prod-eu-west-2)

The Tempo service on cluster EU west experienced a traffic increase over the weekend, which caused an elevated error rate in Tempo's write path (ingestion). Our engineering team identified the root cause of the issue, and implemented measurements for palliating and resolving the problem.

Traces ingestion problems could have been experienced from 15:30 UTC on 13th until 19:30 UTC on 15th.

1750066626 - 1750066626 Resolved

Major GCP Incident Affecting Multiple Grafana Cloud Components (Including AWS and Azure deployed Instances)

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

1749753396 - 1749765102 Resolved

Loki - Slow Queries

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

1749673936 - 1749679047 Resolved

Instances on the "Slow" Release Channel Receiving Unexpected Errors.

At approximately 12:00 UTC a feature toggle was rolled out which negatively impacted instances on the slow release channel. Users on this release channel began to receive an "AlertStatesDataLayer" error. A workaround was quickly identified and applied to reporting users. The feature toggle in question was fully reverted by 18:00 UTC.

1749074414 - 1749074414 Resolved

Private Datasource Connect - New Agents Failing To Get SSH Certificates Signed

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

1748964192 - 1748967164 Resolved

Scheduled maintenance caused a temporary issue with logging in into Grafana Cloud stacks.

Due to scheduled maintenance (https://status.grafana.com/incidents/rz7nt6cs4prb) we hit an issue with some users being unable to log in into their Grafana Cloud stacks. The issue was affecting only users who:

1748951344 - 1748951344 Resolved

Elevated latency in prod-us-east-0 cluster.

This incident has been resolved.

1748876166 - 1748887123 Resolved

Push errors and elevated latency in prod-us-central-0 cluster.

This incident has been resolved.

1748874048 - 1748887107 Resolved

Some Pages in the User Portal Showing Errors.

This incident has been resolved.

1748529881 - 1748538708 Resolved

Issues with Hosted Grafana in some Regions.

This incident has been resolved.

1748353634 - 1748375260 Resolved
⮜ Previous Next ⮞