Disruption with some GitHub services


Incident resolved in 27m50s

Resolved

Between Dec 3 03:35 UTC and 04:35 UTC, availability of large hosted runners for Actions was degraded due to failures in background VM provisioning jobs. This was a shorter recurrence of the issue that occurred the previous day. Users would see workflows queued waiting for a large runner. On average, 13.5% of all workflows requiring large runners over the incident time were affected, peaking at 46% of requests. Standard and Mac runners were not affected.Following the Dec 1 incident, we had disabled non-critical paths in the provisioning job and believed that would eliminate any impact while we understood and addressed the timeouts. Unfortunately, the timeouts were a symptom of broader job health issues, so those changes did not prevent this second occurrence the following day. We now understand that other jobs on these agents had issues that resulted in them hanging and consuming available job agent capacity. The reduced capacity led to saturation of the remaining agents and significant performance degradation in the running jobs.In addition to the immediate improvements shared in the previous incident summary, we immediately initiated regular recycles of all agents in this area while we continue to address the issues in both the jobs themselves and the resiliency of the agents. We also continue to improve our detection to ensure we are automatically detecting these delays.

1733200787

Investigating

We saw a recurrence of the large hosted runner incident (https://www.githubstatus.com/incidents/qq1m7mqcl6zk) from 12/1/2024. We've applied the same mitigation and see improvements. We will continue to work on a long term solution.

1733200705

Investigating

We are investigating reports of degraded performance for Hosted Runners

1733199398

Investigating

We are currently investigating this issue.

1733199117