Incident History

Inconsistent results when using the Haiku 4.5 model

From October 28th at 16:03 UTC until 17:11 UTC, the Copilot service experienced degradation due to an infrastructure issue which impacted the Claude Haiku 4.5 model, leading to a spike in errors affecting 1% of users. No other models were impacted. The incident was caused due to an outage with an upstream provider. We are working to improve redundancy during future occurrences.

1761669587 - 1761671505 Resolved

Disruption with viewing some repository pages from large organizations

Between October 23, 2025 19:27:29 UTC and October 27, 2025 17:42:42 UTC, users experienced timeouts when viewing repository landing pages. We observed the timeouts for approximately 5,000 users across less than 1,000 repositories including forked repositories. The impact was limited to logged in users accessing repositories in organizations with more than 200,000 members. Forks of repositories from affected large organizations were also impacted. Git operations were functional throughout this period.This was caused by feature flagged changes impacting organization membership. The changes caused unintended timeouts for organization membership count evaluations which led to repository landing pages not loading.The flag was turned off and a fix addressing the timeouts was deployed, including additional optimizations to better support organizations of this size. We are reviewing related areas and will continue to monitor for similar performance regressions.

1761582336 - 1761587511 Resolved

githubstatus.com was unavailable UTC 2025 Oct 24 02:55 to 03:13

On UTC Oct 24 2:55 - 3:15 AM, githubstatus.com was unreachable due to service interruption with our status page provider. During this time, GitHub systems were not experiencing any outages or disruptions. We are working our vendor to understand how to improve availability of githubstatus.com.

1761315427 - 1761315427 Resolved

git operations over ssh seeing increased latency on github.com

From Oct 22, 2025 15:00 UTC to Oct 24, 2025 14:30 UTC git operations via SSH saw periods of increased latency and failed requests, with failure rates ranging from 1.5% to a single spike of 15%. Git operations over http were not affected. This was due to resource exhaustion on our backend ssh servers. We mitigated the incident by increasing the available resources for ssh connections. We are improving the observability and dynamic scalability of our backend to prevent issues like this in the future.

1761298319 - 1761300620 Resolved

Incident with Actions - Larger hosted runners

On October 23, 2025, between 15:54 UTC and 19:20 UTC, GitHub Actions larger hosted runners experienced degraded performance, with 1.4% of overall workflow runs and 29% of larger hosted runner jobs failing to start or timing out within 5 minutes.The full set of contributing factors is still under investigation, but the customer impact was due to database performance degradation, triggered by routine database changes causing a load profile that triggered a bug in the underlying database platform used for larger runners.Impact was mitigated through a combination of scaling up the database and reducing load. We are working with partners to resolve the underlying bug and have paused similar database changes until it is resolved.

1761237205 - 1761251152 Resolved

Incident with API Requests

On October 22, 2025, between 14:06 UTC and 15:17 UTC, less than 0.5% of web users experienced intermittent slow page loads on GitHub.com. During this time, API requests showed increased latency, with up to 2% timing out. The issue was caused by elevated loads on one of our databases caused by a poorly performing query, which impacted performance for a subset of requests.We identified the source of the load and optimized the query to restore normal performance. We’ve added monitors for early detection for query performance, and we continue to monitor the system closely to ensure ongoing stability.

1761143364 - 1761148417 Resolved

Disruption with some GitHub services

On October 21, 2025, between 13:30 and 17:30 UTC, GitHub Enterprise Cloud Organization SAML Single Sign-On experienced degraded performance. Customers may have been unable to successfully authenticate into their GitHub Organizations during this period. Organization SAML recorded a maximum of 0.4% of SSO requests failing during this timeframe.This incident stemmed from a failure in a read replica database partition responsible for storing license usage information for GitHub Enterprise Cloud Organizations. This partition failure resulted in users from affected organizations, whose license usage information was stored on this partition, being unable to access SSO during the aforementioned window. A successful SSO requires an available license for the user who is accessing a GitHub Enterprise Cloud Organization backed by SSO.The failing partition was subsequently taken out of service, thereby mitigating the issue. Remedial actions are currently underway to ensure that a read replica failure does not compromise the overall service availability.

1761062454 - 1761068374 Resolved

Incident with Actions

On October 21, 2025, between 07:55 UTC and 12:20 UTC, GitHub Actions experienced degraded performance. During this time, 2.11% workflow runs failed to start within 5 minutes, with an average delay of 8.2 minutes. The root cause was increased latency on a node in one of our Redis clusters, triggered by resource contention after a patching event became stuck. Recovery began once the patching process was unstuck and normal connectivity to the Redis cluster was restored at 11:45 UTC, but it took until 12:20 UTC to clear the backlog of queued work. We are implementing safeguards to prevent this failure mode and enhancing our monitoring to detect and address problems like this more quickly in the future.

1761037970 - 1761049699 Resolved

Disruption with Grok Code Fast 1 in Copilot

From October 20th at 14:10 UTC until 16:40 UTC, the Copilot service experienced degradation due to an infrastructure issue which impacted the Grok Code Fast 1 model, leading to a spike in errors affecting 30% of users. No other models were impacted. The incident was caused due to an outage with an upstream provider.

1760971560 - 1760978402 Resolved

Codespaces creation failling

On October 20, 2025, between 08:05 UTC and 10:50 UTC the Codespaces service was degraded, with users experiencing failures creating new codespaces and resuming existing ones. On average, the error rate for codespace creation was 39.5% and peaked at 71% of requests to the service during the incident window. Resume operations averaged 23.4% error rate with a peak of 46%. This was due to a cascading failure triggered by an outage in a 3rd-party dependency required to build devcontainer images.The impact was mitigated when the 3rd-party dependency recovered.We are investigating opportunities to make this dependency not a critical path for our container build process and working to improve our monitoring and alerting systems to reduce our time to detection of issues like this one in the future.

1760950588 - 1760958074 Resolved
⮜ Previous Next ⮞