Incident with Actions


Incident resolved in 38m8s

Resolved

On August 22, 2024, between 16:10 UTC and 17:28 UTC, Actions experienced degraded performance leading to failed workflow runs. On average, 2.5% of workflow runs failed to start with the failure rate peaking at 6%. In addition we saw a 1% error rate for Actions API endpoints. This was due to an Actions service being deployed to faulty hardware that had an incorrect memory configuration, leading to significant performance degradation of those pods due to insufficient memory.The impact was mitigated when the pods were evicted automatically and moved to healthy hosts. The faulty hardware was disabled to prevent a recurrence. We are improving our health checks to ensure that unhealthy hardware is consistently marked offline automatically. We are also improving our monitoring and deployment practices to reduce our time to detection and automated mitigation at the service layer for issues like this in the future.

1724347686

Investigating

We are investigating issues with failed workflow runs due to internal errors. We are seeing signs of recovery and continuing to monitor the situation.

1724347315

Investigating

We are investigating reports of degraded performance for Actions

1724345398