Multiple services are affected, service degradation


Incident resolved in 2h55m22s

Resolved

On Mar 5, 2026, between 16:24 UTC and 19:30 UTC, Actions was degraded. During this time, 95% of workflow runs failed to start within 5 minutes with an average delay of 30 minutes and 10% workflow runs failed with an infrastructure error. This was due to Redis infrastructure updates that were being rolled out to production to improve our resiliency. These changes introduced a set of incorrect configuration change into our Redis load balancer causing internal traffic to be routed to an incorrect host leading to two incidents. We mitigated this incident by correcting the misconfigured load balancer. Actions jobs were running successfully starting at 17:24 UTC. The remaining time until we closed the incident was burning through the queue of jobs. We immediately rolled back the updates that were a contributing factor and have frozen all changes in this area until we have completed follow-up work from this. We are working to improve our automation to ensure incorrect configuration changes are not able to propagate through our infrastructure. We are also working on improved alerting to catch misconfigured load balancers before it becomes an incident. Additionally, we are updating the Redis client configuration in Actions to improve resiliency to brief cache interruptions.

1772739054

Investigating

Webhooks is operating normally.

1772738268

Investigating

Actions is operating normally.

1772737524

Investigating

Actions is now fully recovered.

1772737147

Investigating

The queue of requested Actions jobs continues to make progress. Job delays are now approximately 6 minutes and continuing to decrease.

1772734556

Investigating

We are back to queueing Actions workflow runs at nominal rates and we are monitoring the clearing of queued runs during the incident.

1772732906

Investigating

We have applied mitigations for connection failures across backend resources and we are observing a recovery in queueing Actions workflow runs.

1772731559

Investigating

We are observing delays in queuing Actions workflow runs. We’re still investigating the causes of these delays.

1772729573

Investigating

Webhooks is experiencing degraded availability. We are continuing to investigate.

1772729231

Investigating

Actions is experiencing degraded availability. We are continuing to investigate.

1772728874

Investigating

We are investigating reports of degraded performance for Actions

1772728532