Disruption in service with some Redis clusters


Incident resolved in 1h20m21s

Investigating

Actions is operating normally.

1722417609

Resolved

On July 31, 2024, between 07:05 UTC and 09:01 UTC the Actions service experienced degradation, preventing it from processing API requests and executing jobs, in particular Pages builds. On average, 2% of jobs run during the incident window were affected. This was due to some nodes in one of our partner services experiencing connectivity issues in the East US2 region. We mitigated the incident by failing over the impacted service and re-routing the service’s traffic out of that region.We are working to improve monitoring and processes of failover to reduce our time to detection and mitigation of issues like this one in the future.

1722417609

Investigating

We are continuing to see improvements in queuing and running Actions jobs and are monitoring for full recovery.

1722417227

Investigating

We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.

1722414515

Investigating

Actions is experiencing degraded performance. We are continuing to investigate.

1722413269

Investigating

We are investigating reports of degraded performance in some Redis clusters.

1722412954

Investigating

We are currently investigating this issue.

1722412788