Incident History

Grafana Cloud Partial Outage

Between 0016Z and 0120Z and 0433Z and 0530Z a cloud networking component facilitating cross-region communication to and from this region experienced an outage. Users experienced errors modifying access policies in addition to elevated error rates for those who had recently modified an access policy. This also resulted in false positive synthetic monitoring alerts for probes in this region.

We're continuing to work with our cloud service provider to determine a root cause for the outage of this component.

1727414032 - 1727428068 Resolved

Grafana Cloud Partial Outage

Engineering deployed something which broke a component managed by one our cloud service providers. We rolled that back and the component now works. Customers should no longer experience nodes not fully booting. The cloud service provider is still trying to figure out why it broke. At this time, we are considering this issue resolved. No further updates.

1727189109 - 1727193872 Resolved

Grafana Cloud k6 Outage

From 15:15 to 15:31 UTC, the Grafana Cloud k6 App could not load. New tests could not be started from the App nor the k6 CLI and already running tests were aborted or timed out.

The immediate issue has been resolved. We are investigating the root cause.

1727168871 - 1727168871 Resolved

Tempo Ingester Restarting

From 18:46:16 to 18:46:26 UTC, we were alerted to an issue that cased a restart of Tempo Ingestors in the US-East region.

During this time, users may have noticed 404 or 500 errors in their agent logging, potentially resulting in a small amount of discarded Tempo traces for the time the ingesters were not available.

Our Engineers were able to identify the cause and a solution was implemented to resolve the issue. Please contact our support team if you notice any discrepancies of have questions.

1727127322 - 1727127322 Resolved

Temporary disruption of service for MultiHTTP and Scripted Synthetic monitoring checks on all regions.

A transient error in our infrastructure caused all public probes to report MultiHTTP and Scripted checks as failures for roughly 5 minutes, from 9:55 UTC to 10:00 UTC. The error has been addressed and all probes should now be operating normally.

1726654312 - 1726654312 Resolved

Synthetic monitoring DNS failures, Frankfurt probe location

From 13:49 - 13:54 UTC a deployment to the Frankfurt probe location caused DNS resolution timeouts affecting DNS, HTTP, MultiHTTP, and Scripted checks with failure rates of 20-50% during this time. After a rollback by 13:54 synthetics tests returned to normal.

1725922477 - 1725922477 Resolved

Latency and connectivity issues, synthetic monitoring, Seoul probe location

From 21:30-22:45 UTC, the Seoul public probe location experienced connectivity problems. We observed failure rates of 10-15% across PING, DNS, HTTP, multiHTTP and k6 scripted checks due to connection timeouts. The issue has cleared but we continue to monitor.

1725672253 - 1725672253 Resolved

Issues accessing SLO's, ops-labels, and log export configs

This incident has been resolved.

1724776842 - 1724783904 Resolved

Issue with on-call permissions

This incident has been resolved.

1724700329 - 1724708723 Resolved

Synthetic Monitoring: New York probe service degradation

This incident has been resolved.

1724663172 - 1724669069 Resolved
⮜ Previous Next ⮞