On October 29th, 2025 between 14:07 UTC and 23:15 UTC, multiple GitHub services were degraded due to a broad outage in one of our service providers:- Users of Codespaces experienced failures connecting to new and existing Codespaces through VSCode Desktop or Web. On average the Codespace connection error rate was 90% and peaked at 100% across all regions throughout the incident period.- GitHub Actions larger hosted runners experienced degraded performance, with 0.5% of overall workflow runs and 9.8% of larger hosted runner jobs failing or not starting within 5 minutes. These recovered by 20:40 UTC.- The GitHub Enterprise Importer service was degraded, with some users experiencing migration failures during git push operations and most users experiencing delayed migration processing.- Initiation of new trials for GitHub Enterprise Cloud with Data Residency were also delayed during this time.- Copilot Metrics via the API could not access the downloadable link during this time. There were approximately 100 requests during the incident that would have failed the download. Recovery began around 20:25 UTC.We were able to apply a number of mitigations to reduce impact over the course of the incident, but we did not achieve 100% recovery until our service provider’s incident was resolved.We are working to reduce critical path dependencies on the service provider and gracefully degrade experiences where possible so that we are more resilient to future dependency outages.
From October 28th at 16:03 UTC until 17:11 UTC, the Copilot service experienced degradation due to an infrastructure issue which impacted the Claude Haiku 4.5 model, leading to a spike in errors affecting 1% of users. No other models were impacted. The incident was caused due to an outage with an upstream provider. We are working to improve redundancy during future occurrences.
Between October 23, 2025 19:27:29 UTC and October 27, 2025 17:42:42 UTC, users experienced timeouts when viewing repository landing pages. We observed the timeouts for approximately 5,000 users across less than 1,000 repositories including forked repositories. The impact was limited to logged in users accessing repositories in organizations with more than 200,000 members. Forks of repositories from affected large organizations were also impacted. Git operations were functional throughout this period.This was caused by feature flagged changes impacting organization membership. The changes caused unintended timeouts for organization membership count evaluations which led to repository landing pages not loading.The flag was turned off and a fix addressing the timeouts was deployed, including additional optimizations to better support organizations of this size. We are reviewing related areas and will continue to monitor for similar performance regressions.
On UTC Oct 24 2:55 - 3:15 AM, githubstatus.com was unreachable due to service interruption with our status page provider.
During this time, GitHub systems were not experiencing any outages or disruptions.
We are working our vendor to understand how to improve availability of githubstatus.com.
From Oct 22, 2025 15:00 UTC to Oct 24, 2025 14:30 UTC git operations via SSH saw periods of increased latency and failed requests, with failure rates ranging from 1.5% to a single spike of 15%. Git operations over http were not affected. This was due to resource exhaustion on our backend ssh servers. We mitigated the incident by increasing the available resources for ssh connections. We are improving the observability and dynamic scalability of our backend to prevent issues like this in the future.
On October 23, 2025, between 15:54 UTC and 19:20 UTC, GitHub Actions larger hosted runners experienced degraded performance, with 1.4% of overall workflow runs and 29% of larger hosted runner jobs failing to start or timing out within 5 minutes.The full set of contributing factors is still under investigation, but the customer impact was due to database performance degradation, triggered by routine database changes causing a load profile that triggered a bug in the underlying database platform used for larger runners.Impact was mitigated through a combination of scaling up the database and reducing load. We are working with partners to resolve the underlying bug and have paused similar database changes until it is resolved.
On October 22, 2025, between 14:06 UTC and 15:17 UTC, less than 0.5% of web users experienced intermittent slow page loads on GitHub.com. During this time, API requests showed increased latency, with up to 2% timing out. The issue was caused by elevated loads on one of our databases caused by a poorly performing query, which impacted performance for a subset of requests.We identified the source of the load and optimized the query to restore normal performance. We’ve added monitors for early detection for query performance, and we continue to monitor the system closely to ensure ongoing stability.
On October 21, 2025, between 13:30 and 17:30 UTC, GitHub Enterprise Cloud Organization SAML Single Sign-On experienced degraded performance. Customers may have been unable to successfully authenticate into their GitHub Organizations during this period. Organization SAML recorded a maximum of 0.4% of SSO requests failing during this timeframe.This incident stemmed from a failure in a read replica database partition responsible for storing license usage information for GitHub Enterprise Cloud Organizations. This partition failure resulted in users from affected organizations, whose license usage information was stored on this partition, being unable to access SSO during the aforementioned window. A successful SSO requires an available license for the user who is accessing a GitHub Enterprise Cloud Organization backed by SSO.The failing partition was subsequently taken out of service, thereby mitigating the issue. Remedial actions are currently underway to ensure that a read replica failure does not compromise the overall service availability.
On October 21, 2025, between 07:55 UTC and 12:20 UTC, GitHub Actions experienced degraded performance. During this time, 2.11% workflow runs failed to start within 5 minutes, with an average delay of 8.2 minutes. The root cause was increased latency on a node in one of our Redis clusters, triggered by resource contention after a patching event became stuck. Recovery began once the patching process was unstuck and normal connectivity to the Redis cluster was restored at 11:45 UTC, but it took until 12:20 UTC to clear the backlog of queued work. We are implementing safeguards to prevent this failure mode and enhancing our monitoring to detect and address problems like this more quickly in the future.
From October 20th at 14:10 UTC until 16:40 UTC, the Copilot service experienced degradation due to an infrastructure issue which impacted the Grok Code Fast 1 model, leading to a spike in errors affecting 30% of users. No other models were impacted. The incident was caused due to an outage with an upstream provider.