On November 12, 2025, between 22:10 UTC and 23:04 UTC, Codespaces used internally at GitHub were impacted. There was no impact to external customers. The scope of impact was not clear in the initial steps of incident response, so it was considered public until confirmed otherwise. One improvement from this will be improved clarity of internal versus public impact for similar failures to better inform our status decisions going forward.
On November 12th, 2025, from 13:10 - 17:40 UTC, notifications service was degraded, showing an increase in web notifications latency and increasing delays in notification deliveries. A change to the notifications settings access path introduced additional load to the settings system, degrading its response times. This impacted both requests to web notifications (with p99 response times as high as 1.5s, while lower percentiles remained stable) and notification deliveries, which reached a peak delay of 24 minutes on average. System capacity was increased around 15:10 UTC and the problematic change was fully reverted soon after that, restoring the latency of web notifications and increasing notification delivery throughput, decreasing the delay in notification deliveries. The notification queue was fully emptied around 17:40 UTC.We are working to adjust capacity in the affected systems and to improve the time needed to address these capacity issues.
On November 11, 2025, between 16:28 UTC and 20:54 UTC, GitHub Actions larger hosted runners experienced degraded performance, with 0.4% of overall workflow runs and 8.8% of larger hosted runner jobs failing to start within 5 minutes. The majority of impact was mitigated by 18:44, with a small tail of organizations taking longer to recover.The impact was caused by the same database infrastructure issue that caused similar larger hosted runner performance degradation on October 23rd, 2025. In this case, it was triggered by a brief infrastructure event in this incident rather than a database change.Through this incident, we identified and implemented a better solution for both prevention and faster mitigation. In addition to this, a durable solution for the underlying database issue is rolling out soon.
Between November 5, 2025 23:27 UTC and November 6, 2025 00:06 UTC, ghost text requests experienced errors from upstream model providers. This was a continuation of the service disruption for which we statused Copilot earlier that day, although more limited in scope.During the service disruption, users were again automatically re-routed to healthy model hosts, minimizing impact to users and we are updating our monitors and failover mechanism to mitigate similar issues in the future.
On November 5, 2025, between 21:46 and 23:36 UTC, ghost text requests experienced errors from upstream model providers that resulted in 0.9% of users seeing elevated error rates.During the service disruption, users were automatically re-routed to healthy model hosts but may have experienced increased latency in response times as a result of re-routing.We are updating our monitors and tuning our failover mechanism to more quickly mitigate issues like this in the future.
On November 3, 2025, between 14:10 UTC and 19:20 UTC, GitHub Packages experienced degraded performance, resulting in failures for 0.5% of Nuget package download requests. The incident resulted from an unexpected change in usage patterns affecting rate limiting infrastructure in the Packages service.We mitigated the issue by scaling up services and refining our rate limiting implementation to ensure more consistent and reliable service for all users. To prevent similar problems, we are enhancing our resilience to shifts in usage patterns, improving capacity planning, and implementing better monitoring to accelerate detection and mitigation in the future.
On November 1, 2025, between 2:30 UTC and 6:14 UTC, Actions workflows could not be triggered manually from the UI. This impacted all customers queueing workflows from the UI for most of the impact window. The issue was caused by a faulty code change in the UI, which was promptly reverted once the impact was identified. Detection was delayed due to an alerting gap for UI breaks in this area when all underlying APIs are still healthy. We are implementing enhanced alerting and additional automated tests to prevent similar regressions and reduce detection time in the future.
On October 30th we shipped a change that broke 3 links in the "Solutions" dropdown of the marketing navigation seen on https://github.com/home. We noticed internally the broken links and declared an incident so our users would know no other functionality was impacted. We were able to revert a change and are evaluating our testing and rollout processes to prevent future incidents like these.
A cloud resource used by the Copilot bing-search tool was deleted as part of a resource cleanup operation. Once this was discovered, the resource was recreated. Going forward, more effective monitoring will be put in place to catch this issue earlier.
On October 29th, 2025 between 14:07 UTC and 23:15 UTC, multiple GitHub services were degraded due to a broad outage in one of our service providers:- Users of Codespaces experienced failures connecting to new and existing Codespaces through VSCode Desktop or Web. On average the Codespace connection error rate was 90% and peaked at 100% across all regions throughout the incident period.- GitHub Actions larger hosted runners experienced degraded performance, with 0.5% of overall workflow runs and 9.8% of larger hosted runner jobs failing or not starting within 5 minutes. These recovered by 20:40 UTC.- The GitHub Enterprise Importer service was degraded, with some users experiencing migration failures during git push operations and most users experiencing delayed migration processing.- Initiation of new trials for GitHub Enterprise Cloud with Data Residency were also delayed during this time.- Copilot Metrics via the API could not access the downloadable link during this time. There were approximately 100 requests during the incident that would have failed the download. Recovery began around 20:25 UTC.We were able to apply a number of mitigations to reduce impact over the course of the incident, but we did not achieve 100% recovery until our service provider’s incident was resolved.We are working to reduce critical path dependencies on the service provider and gracefully degrade experiences where possible so that we are more resilient to future dependency outages.