Incident History

Some Grafana Instances Taking Longer to Initialize

This incident has been resolved.

1742228889 - 1742232770 Resolved

Tests failures on k6

This incident has been resolved.

1742227931 - 1742234192 Resolved

Issue with Grafana Access in prod-ap-south-1 and prod-ap-northeast-0

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.

1741835958 - 1741845572 Resolved

Degraded performance when querying metrics on AWS Germany

On 12th of March, between 13:30 UTC and 13:50 UTC we experienced some degraded performance when querying metrics on the AWS Germany cluster. The effects were reflected in unavailability to see metrics on dashboards and alert queries triggering errors during that period. The issue has been resolved.

1741794389 - 1741794389 Resolved

Delay in aggregated series through adaptive metrics

All series are now caught up and the incident is resolved.

1741726317 - 1741727173 Resolved

Write path down in prod-us-east-2-prometheus-prod-56

This incident has been resolved.

1741300876 - 1741313751 Resolved

Read errors in prod-eu-west-0

Many ingesters were evicted from nodes in cortex-prod-01 at once causing a read path outage. Once the ingesters were rescheduled the read path recovered. The errors lasted about 10 minutes.

1741277774 - 1741279759 Resolved

Degraded performance due to overloaded internal queues

Our internal queues were affected by a scheduled data migration. For around three hours asynchronous scheduled tasks were affected and their processing delayed, but none cancelled or lost. Test scheduled particularly may have been not run at their intended time during that period.

1740748196 - 1740748196 Resolved

Degraded UX with IRM Slack Integration.

This incident has been resolved.

1740593594 - 1740660926 Resolved

New instance creation taking longer than expected

This incident has been resolved.

1740522461 - 1740552759 Resolved
⮜ Previous Next ⮞