Outage due to DNS problems on AWS
Resolved
This incident has been resolved.
Update
Our engineering teams have applied fixes to tackle the issues, and all services on the AWS clusters should be operational.
We will continue monitoring them.
Update
Update in affected components: Prometheus Metrics on us-east-0 and services depending on this are back to operational.
Update
Update in affected components: Synthetics Monitoring components back to operational, services depending on Prometheus Metrics on us-east-0 still under examination.
We are continuing to monitor all the services across AWS clusters. Beware that some services may still have degraded performance until fully infrastructure stabilization.
Update
We are continuing to work on a fix for this issue.
Update
Update in affected components: IRM components partially recovered, Oncall services are fully operational, Incident services recovering. Prometheus services are almost fully operational (monitoring recovery on us-east-0)
Update
Update in affected components: Tempo services and asserts services have been restored, alerting services have been partially restored.
Currently monitoring all operative services.
Update
We have identified the issue, and we are bringing back to operational state most of the services including: Loki services, Pyroscope services, and AI/ML Services. We are monitoring these services.
Investigating
Update in affected components: OTLP Endpoint and Graphite proxy for querying and ingesting are fully operational.
Investigating
We are continuing to investigate this issue and working on reestablishing the service.
Investigating
Update in affected components: Hosted Grafana instances (stacks) are operational.
Investigating
Update in affected components: Grafana Cloud k6 (and legacy app.k6.io) are fully operational.
Investigating
We are continuing to investigate this issue and determining the full impact.
Investigating
Update in components scope: potentially all our services running on AWS may be affected.
Investigating
Update in components scope: potentially all our services running on AWS may be affected.
Investigating
We are continuing to investigate this issue.
Investigating
We are currently experiencing an outage on our instances locate on AWS cloud due to DNS problems. We are actively working to reestablish the service and quantify the whole impact of the issue. All our services running on this provider may be potentially affected.