Incident with Issues and Pull Requests
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
On December 18, 2025, between 16:25 UTC and 19:09 UTC the service underlying Copilot policies was degraded and users, organizations, and enterprises were not able to update any policies related to Copilot. No other GitHub services, including other Copilot services were impacted. This was due to a database migration causing a schema drift.We mitigated the incident by synchronizing the schema. We have hardened the service to make sure schema drift does not cause any further incidents, and will investigate improvements in our deployment pipeline to shorten time to mitigation in the future.
On December 18th, 2025, from 08:15 UTC to 17:11 UTC, some GitHub Actions runners experienced intermittent timeouts for Github API calls, which led to failures during runner setup and workflow execution. This was caused by network packet loss between runners in the West US region and one of GitHub’s edge sites. Approximately 1.5% of jobs on larger and standard hosted runners in the West US region were impacted, 0.28% of all Actions jobs during this period.By 17:11 UTC, all traffic was routed away from the affected edge site, mitigating the timeouts. We are working to improve early detection of cross-cloud connectivity issues and faster mitigation paths to reduce the impact of similar issues in the future.
During an investigation into a unrelated issue, this issue was elevated to public status by mistake, with a title intended for the other incident. We immediately resolved this in order to ensure that our internal investigation was aligned with the correct public status.
From 11:50-12:25 UTC, Copilot Coding Agent was unable to process new agent requests. This affected all users creating new jobs during this timeframe, while existing jobs remained unaffected. The cause of this issue was a change to the actions configuration where Copilot Coding Agent runs, which caused the setup of the Actions runner to fail, and the issue was resolved by rolling back this change. As a short term solution, we hope to increase our alerting criteria so that we can be alerted more quickly when an incident occurs, and in the long term we hope to harden our runner configuration to be more resilient against errors.
On December 15, 2025, between 15:15 UTC and 18:22 UTC, Copilot Code Review experienced a service degradation that caused 46.97% of pull request review requests to fail, requiring users to re-request a review. Impacted users saw the error message: “Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.” The remaining requests completed successfully.The degradation was caused by elevated response times in an internal, model-backed dependency, which led to request timeouts and backpressure in the review processing pipeline, resulting in sustained queue growth and failed review completion.We mitigated the issue by temporarily bypassing fix suggestions to reduce latency, increasing worker capacity to drain the backlog, and rolling out a model configuration change that reduced end-to-end latency. Queue depth and request success rates returned to normal and remained stable through peak traffic.Following the incident, we increased baseline worker capacity, added instrumentation for worker utilization and queue health, and are improving automatic load-shedding, fallback behavior, and alerting to reduce time to detection and mitigation for similar issues.
On Dec 15th, 2025, between 14:00 UTC and 15:45 UTC the Copilot service was degraded for Grok Code Fast 1 model. On average, 4% of the requests to this model failed due to an issue with our upstream provider. No other models were impacted.The issue was resolved after the upstream provider fixed the problem that caused the disruption. GitHub will continue to enhance our monitoring and alerting systems to reduce the time it takes to detect and mitigate similar issues in the future.
On December 3, 2025, between 22:21 UTC and 23:44 UTC, the Webhooks service experienced a degradation that delayed writes of webhook delivery records to our database. During this period, many webhook deliveries were not visible in the webhook delivery UI or API for more than an hour after they were sent. As a result, customers were temporarily unable to request redeliveries for those delayed records. The underlying cause was throttling of database writes due to high replication lag.
We mitigated the incident by temporarily disabling delivery history for a small number of very high‑volume webhook owners to reduce write pressure and stabilize the service. We are contacting the affected customers directly with more details.
We are improving our webhook delivery storage architecture so it can scale with current and future webhook traffic, reducing the likelihood and impact of similar issues.
Between 13:25 UTC and 18:35 UTC on Dec 11th, GitHub experienced an increase in scraper activity on public parts of our website. This scraper activity caused a low priority web request pool to increase and eventually exceed total capacity resulting in users experiencing 500 errors. In particular, this affected Login, Logout, and Signup routes, along with less than 1% requests from within Actions jobs. At the peak of the incident, 7.6% of login requests were impacted, which was the most significant impact of this scraping attack.Our mitigation strategy identified the scraping activity and blocked it. We also increased the pool of web requests that were impacted to have more capacity, and lastly we upgraded key user login routes to higher priority queues. In future, we’re working to more proactively identify this particular scraper activity and have faster mitigation times.