Incident History

Complete outage in prod-me-central-1

We have not received any further updates from AWS at this time. However, we are actively monitoring the outage and will provide additional information as it becomes available. Also, please continue to refer to the AWS status page for more detailed updates. https://health.aws.amazon.com/health/status

All the guidance previously included about stack migration is still relevant. Please reach out to our Support team if you have any questions.

1772433809 Ongoing

Increased Latency for Small Subset of Customers

A recent rollout caused the AuthZ (RBAC) service to perform many redundant folder-tree fetches for each authorization check. For a small number of tenants in the prod-us-east-0 and prod-eu-west-2 regions with very large folder trees. This added a few milliseconds to every check, which increased request latency.

The approximate timeframe of the impact is:

2026-02-26 17:24:43 UTC to 2026-02-27 14:33:53 UTC.

This has now been resolved.

1772209516 - 1772209516 Resolved

Trace querying issue in all Tempo clusters

This incident has been resolved.

1772199975 - 1772235481 Resolved

Incorrect pipeline assignment after custom attributes are assigned

This incident has been resolved.

1772197067 - 1772205879 Resolved

Grafana Cloud Faro slowness of listing and uploading sourcemaps in all regions.

This incident has been resolved.

1772110806 - 1772160545 Resolved

Grafana Cloud Metrics - Intermittent Write Latency in prod-us-central, prod-us-central-5, and prod-eu-west-0

This incident is now resolved.

During the incident the Cloud Metrics platform experienced intermittent latency spikes communicating with a backend cloud service in the prod-us-central-0 and prod-us-central-5 regions. During the incident the internal CSP-facing issue was escalated to a P1. After determining the scope of the latency spikes was limited to only one availability zone, the team mitigated the situation by migrating all write traffic from to the single nearly unaffected availability zone.

As the CSP service team attempted to remedy the situation, the situation became worse and began affecting the previously unaffected zone. Given this, another mitigation path was needed. Changing the connection strategy employed by Cloud Metrics to a different method was deployed to all environments, stabilizing the write path once again as we found the different connection method was more reliable and not affected by these increases in latency.

We have migrated all tenants back to multi-zone write paths and are happy with and confident in the current method of connectivity to the backend cloud service, which is the one we migrated to during the course of the incident. We have no immediate plans to use the previous problematic connectivity method for the foreseeable future.

1772049255 - 1773771739 Resolved

Issues Loading Dashboards and Alert Folders in Hosted Grafana

This incident has been resolved.

1772041482 - 1772049066 Resolved

Partial Write & Rule Evaluation Outage in prod-eu-west-3

This incident has been resolved.

1772031918 - 1772040054 Resolved

Grafana Cloud Traces prod-eu-west-6 region (AWS Ireland) wrong URL endpoint shown for traces ingestion.

This incident has been resolved.

1772023299 - 1772031949 Resolved

Some Alert Rule Evaluations Failing

This incident has been resolved.

1771943483 - 1771952985 Resolved
⮜ Previous Next ⮞