Incident History

Read errors in prod-eu-west-0

Many ingesters were evicted from nodes in cortex-prod-01 at once causing a read path outage. Once the ingesters were rescheduled the read path recovered. The errors lasted about 10 minutes.

Degraded performance due to overloaded internal queues

Our internal queues were affected by a scheduled data migration. For around three hours asynchronous scheduled tasks were affected and their processing delayed, but none cancelled or lost. Test scheduled particularly may have been not run at their intended time during that period.

Degraded UX with IRM Slack Integration.

This incident has been resolved.

New instance creation taking longer than expected

This incident has been resolved.

Longer than expected load times in multiple AWS regions

This incident has been resolved.

New Instances failing to create across all regions.

Our Engineering team has identified that this issue is related to: https://status.grafana.com/incidents/b6yc10yk8kv7 We'll be marking this incident as resolved and will provide updates for this event as well in the status page mentioned above.

Degraded query performance as a result of elevated read latency on the trace by id path

This incident has been resolved.

Logs read and write outage in prod-us-east-0 (including OTLP endpoint)

This incident has been resolved.

OTEL Gateways Failing

This incident has been resolved.

Outage due to DNS problems on AWS

This incident has been resolved.

⮜ Previous Next ⮞