Disruption with some GitHub services
Resolved
Between Dec 3 03:35 UTC and 04:35 UTC, availability of large hosted runners for Actions was degraded due to failures in background VM provisioning jobs. This was a shorter recurrence of the issue that occurred the previous day. Users would see workflows queued waiting for a large runner. On average, 13.5% of all workflows requiring large runners over the incident time were affected, peaking at 46% of requests. Standard and Mac runners were not affected.Following the Dec 1 incident, we had disabled non-critical paths in the provisioning job and believed that would eliminate any impact while we understood and addressed the timeouts. Unfortunately, the timeouts were a symptom of broader job health issues, so those changes did not prevent this second occurrence the following day. We now understand that other jobs on these agents had issues that resulted in them hanging and consuming available job agent capacity. The reduced capacity led to saturation of the remaining agents and significant performance degradation in the running jobs.In addition to the immediate improvements shared in the previous incident summary, we immediately initiated regular recycles of all agents in this area while we continue to address the issues in both the jobs themselves and the resiliency of the agents. We also continue to improve our detection to ensure we are automatically detecting these delays.
Investigating
We saw a recurrence of the large hosted runner incident (https://www.githubstatus.com/incidents/qq1m7mqcl6zk) from 12/1/2024. We've applied the same mitigation and see improvements. We will continue to work on a long term solution.
Investigating
We are investigating reports of degraded performance for Hosted Runners
Investigating
We are currently investigating this issue.