Incident History

Intermittent networking issues across hosted runners

During an investigation into a unrelated issue, this issue was elevated to public status by mistake, with a title intended for the other incident. We immediately resolved this in order to ensure that our internal investigation was aligned with the correct public status.

1766075536 - 1766076179 Resolved

Incident With Copilot

From 11:50-12:25 UTC, Copilot Coding Agent was unable to process new agent requests. This affected all users creating new jobs during this timeframe, while existing jobs remained unaffected. The cause of this issue was a change to the actions configuration where Copilot Coding Agent runs, which caused the setup of the Actions runner to fail, and the issue was resolved by rolling back this change. As a short term solution, we hope to increase our alerting criteria so that we can be alerted more quickly when an incident occurs, and in the long term we hope to harden our runner configuration to be more resilient against errors.

1765998274 - 1765998274 Resolved

Copilot Code Review is degraded, and not returning responses to users

On December 15, 2025, between 15:15 UTC and 18:22 UTC, Copilot Code Review experienced a service degradation that caused 46.97% of pull request review requests to fail, requiring users to re-request a review. Impacted users saw the error message: “Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.” The remaining requests completed successfully.The degradation was caused by elevated response times in an internal, model-backed dependency, which led to request timeouts and backpressure in the review processing pipeline, resulting in sustained queue growth and failed review completion.We mitigated the issue by temporarily bypassing fix suggestions to reduce latency, increasing worker capacity to drain the backlog, and rolling out a model configuration change that reduced end-to-end latency. Queue depth and request success rates returned to normal and remained stable through peak traffic.Following the incident, we increased baseline worker capacity, added instrumentation for worker utilization and queue health, and are improving automatic load-shedding, fallback behavior, and alerting to reduce time to detection and mitigation for similar issues.

1765820623 - 1765822932 Resolved

Incident with Copilot Grok Code Fast 1

On Dec 15th, 2025, between 14:00 UTC and 15:45 UTC the Copilot service was degraded for Grok Code Fast 1 model. On average, 4% of the requests to this model failed due to an issue with our upstream provider. No other models were impacted.The issue was resolved after the upstream provider fixed the problem that caused the disruption. GitHub will continue to enhance our monitoring and alerting systems to reduce the time it takes to detect and mitigate similar issues in the future.

1765807969 - 1765813552 Resolved

Webhooks delivery degradation

On December 3, 2025, between 22:21 UTC and 23:44 UTC, the Webhooks service experienced a degradation that delayed writes of webhook delivery records to our database. During this period, many webhook deliveries were not visible in the webhook delivery UI or API for more than an hour after they were sent. As a result, customers were temporarily unable to request redeliveries for those delayed records. The underlying cause was throttling of database writes due to high replication lag.

We mitigated the incident by temporarily disabling delivery history for a small number of very high‑volume webhook owners to reduce write pressure and stabilize the service. We are contacting the affected customers directly with more details.

We are improving our webhook delivery storage architecture so it can scale with current and future webhook traffic, reducing the likelihood and impact of similar issues.

1765575278 - 1765575278 Resolved

Disruptions in Login and Signup Flows

Between 13:25 UTC and 18:35 UTC on Dec 11th, GitHub experienced an increase in scraper activity on public parts of our website. This scraper activity caused a low priority web request pool to increase and eventually exceed total capacity resulting in users experiencing 500 errors. In particular, this affected Login, Logout, and Signup routes, along with less than 1% requests from within Actions jobs. At the peak of the incident, 7.6% of login requests were impacted, which was the most significant impact of this scraping attack.Our mitigation strategy identified the scraping activity and blocked it. We also increased the pool of web requests that were impacted to have more capacity, and lastly we upgraded key user login routes to higher priority queues. In future, we’re working to more proactively identify this particular scraper activity and have faster mitigation times.

1765478412 - 1765483546 Resolved

We are investigating a rise in request failures on several services

Between 13:25 UTC and 18:35 UTC on December 11th, GitHub experienced elevated traffic to portions of GitHub.com that exceeded previously provisioned capacity for specific request types. As a result, users encountered intermittent 500 errors. Impact was most pronounced on Login, Logout, and Signup pages, peaking at 7.6% of login requests. Additionally, fewer than 1% of requests originating from GitHub Actions jobs were affected. This incident was driven by the same underlying factors as the previously reported disruption to Login and Signup flowsOur immediate response focused on identifying and mitigating the source of the traffic increase. We increased available capacity for web request handling to relieve pressure on constrained pools. To reduce recurrence risk, we also re-routed critical authentication endpoints to a different traffic pool, ensuring sufficient isolation and headroom for login related traffic.In future, we’re working to more proactively identify these large changes in traffic volume and improve our time to mitigation.

1765468056 - 1765475602 Resolved

Some macOS Actions jobs routing to Ubuntu instead

Between December 9th, 2025 21:07 UTC and December 10th, 2025 14:52 UTC, 177 macos-14-large jobs were run on an Ubuntu larger runner VM instead of MacOS runner VMs. The impacted jobs were routed to a larger runner with incorrect metadata. We mitigated this by deleting the runner.The routing configuration is not something controlled externally. A manual override was done previously for internal testing, but left incorrect metadata for a large runner instance. An infrastructure migration caused this misconfigured runner to come online which started the incorrect assignments. We are removing the ability to manually override this configuration entirely, and are adding alerting to identify possible OS mismatches for hosted runner jobs.As a reminder, hosted runner VMs are secure and ephemeral, with every VM reimaged after every single job. All jobs impacted here were originally targeted at a GitHub-owned VM image and were run on a GitHub-owned VM image.

1765373686 - 1765378362 Resolved

Some Actions customers experiencing run start delays

On December 10, 2025 between 08:50 UTC and 11:00 UTC, some GitHub Actions workflow runs experienced longer-than-normal wait times for jobs starting or completing. All jobs successfully completed despite the delays. At peak impact, approximately 8% of workflow runs were affected.During this incident, some nodes received a spike in workflow events that led to queuing of event processing. Because runs are pinned to nodes, runs being processed by these nodes saw delays in starting or showing as completed. The team was alerted to this at 8:58 UTC. Impacted nodes were disabled from processing new jobs to allow queues to drain.We have increased overall processing capacity and are implementing safeguards to better balance load across all nodes when spikes occur. This is important to ensure our available capacity can always be fully utilized.

1765357877 - 1765364735 Resolved

Disruption with some GitHub services

On December 8, 2025, between 21:15 and 22:24 UTC, Copilot code completions experienced a significant service degradation. During this period, up to 65% of code completion requests failed.The root cause was an internal feature flag that caused the primary model supporting Copilot code completions to appear unavailable to the backend service. The issue was resolved once the flag was disabled.To prevent recurrence, we expanded test coverage for Copilot code completion models and are strengthening our detection mechanisms to better identify and respond to traffic anomalies.

1765229307 - 1765233203 Resolved
⮜ Previous Next ⮞