Incident History

Claude Opus 4 is experiencing degraded performance

On September 24th, 2025, between 08:02 UTC and 09:11 UTC the Copilot service was degraded for Claude Opus 4 and Claude Opus 4.1 requests. On average, 22% of requests failed for Claude Opus 4 and 80% of requests for Claude Opus 4.1. This was due to an upstream provider returning elevated errors on Claude Opus 4 and Opus 4.1.

We mitigated the issue by directing users to select other models and by monitoring recovery. To resolve the issue, we are expanding failover capabilities by integrating with additional infrastructure providers.

1758704891 - 1758705510 Resolved

Incident with Copilot

Between 20:06 UTC September 23 and 04:58 UTC September 24, 2025, the Copilot service experienced degraded availability for Claude Sonnet 4 and 3.7 model requests.During this period, 0.46% of Claude 4 requests and 7.83% of Claude 3.7 requests failed.The reduced availability resulted from Copilot disabling routing to an upstream provider that was experiencing issues and reallocating capacity to other providers to manage requests for Claude Sonnet 3.7 and 4.We are continuing to investigate the source of the issues with this provider and will provide an update as more information becomes available.

1758666141 - 1758673589 Resolved

Incident with Pages and Actions

On September 23, between 17:11 and 17:40 UTC, customers experienced failures and delays when running workflows on GitHub Actions and building or deploying GitHub Pages. The issue was caused by a faulty configuration change that disrupted service to service communication in GitHub Actions. During this period, in-progress jobs were delayed and new jobs would not start due to a failure to acquire runners, and about 30% of all jobs failed. GitHub Pages users were unable to build or deploy their Pages during this period.The offending change was rolled back within 15 minutes of its deployment, after which Actions workflows and Pages deployments began to succeed. Actions customers continued to experience delays for about 15 minutes after the rollback was completed while services worked through the backlog of queued jobs. We are planning to implement additional rollout checks to help detect and prevent similar issues in the future.

1758648491 - 1758649317 Resolved

Disruption with some GitHub services

On September 23, 2025, between 15:29 UTC and 17:38 UTC and also on September 24, 2025 between 14:02 UTC and 15:12 UTC, email deliveries were delayed up to 50 minutes which resulted in significant delays for most types of email notifications. This occurred due to an unusually high volume of traffic which caused resource contention on some of our outbound email servers.We have updated the configuration we use to better allocate capacity when there is a high volume of traffic and are also updating our monitors so we can detect this type of issue before it becomes a customer impacting incident.

1758646013 - 1758649225 Resolved

Incident with Codespaces

On September 17, 2025 between 13:23 and 16:51 UTC some users in West Europe experienced issues with Codespaces that had shut down due to network disconnections and subsequently failed to restart. Codespace creations and resumes were failed over to another region at 15:01 UTC. While many of the impacted instances self-recovered after mitigation efforts, approximately 2,000 codespaces remained stuck in a "shutting down" state while the team evaluated possible methods to recover unpushed data from the latest active session of affected codespaces. Unfortunately, recovery of that data was not possible. We unblocked shutdown of those codespaces, with all instances either shut down or available by 8:26 UTC on September 19.The disconnects were triggered by an exhaustion of resources in the network relay infrastructure in that region, but the lack of self-recovery was caused by an unhandled error impacting the local agent, which led to an unclean shutdown.We are improving the resilience of the local agent to disconnect events to ensure shutdown of codespaces is always clean without data loss. We have also addressed the exhausted resources in the network relay and will be investing in improved detection and resilience to reduce the impact of similar events in the future.

1758121488 - 1758131739 Resolved

Unauthenticated LFS requests for public repos are returning unexpected 401 errors

Between 16:26 UTC on September 15th and 18:30 UTC on September 16th, anonymous REST API calls to approximately 20 endpoints were incorrectly rejected because they were not authenticated. While this caused unauthenticated requests to be rejected by these endpoints, all authenticated requests were unaffected, and no protected endpoints were exposed.This resulted in 100% of requests to these endpoints failing at peak, representing less than 0.1% of GitHub’s overall request volume. On average, the error rate for these endpoints was less than 50% and peaked at 100% for about 26 hours over September 16th. API requests to the impacted endpoints were rejected with a 401 error code. This was due to a mismatch in authentication policies, for specific endpoints, during a system migration.The failure to detect the errors was the result of the issue occurring for a low percentage of traffic.We mitigated the incident by reverting the policy in question, and correcting the logic associated with the degraded endpoints. We are working to improve our test suite to further validate mismatches, and refining our monitors for proactive detection.

1758045311 - 1758047408 Resolved

Creating GitHub apps using the REST API will fail with a 401 error

Between 16:26 UTC on September 15th and 18:30 UTC on September 16th, anonymous REST API calls to approximately 20 endpoints were incorrectly rejected because they were not authenticated. While this caused unauthenticated requests to be rejected by these endpoints, all authenticated requests were unaffected, and no protected endpoints were exposed.This resulted in 100% of requests to these endpoints failing at peak, representing less than 0.1% of GitHub’s overall request volume. On average, the error rate for these endpoints was less than 50% and peaked at 100% for about 26 hours over September 16th. API requests to the impacted endpoints were rejected with a 401 error code. This was due to a mismatch in authentication policies, for specific endpoints, during a system migration.The failure to detect the errors was the result of the issue occurring for a low percentage of traffic.We mitigated the incident by reverting the policy in question, and correcting the logic associated with the degraded endpoints. We are working to improve our test suite to further validate mismatches, and refining our monitors for proactive detection.

1758042885 - 1758044722 Resolved

Disruption with some GitHub services

On September 15th between 17:55 and 18:20 UTC, Copilot experienced degraded availability for all features. This was due a partial deployment of a feature flag to a global rate limiter. The flag triggered behavior that unintentionally rate limited all requests, resulting in 100% of them returning 403 errors. The issue was resolved by reverting the feature flag which resulted in immediate recovery.The root cause of the incident was from an undetected edge case in our rate limiting logic. The flag was meant to scale down rate limiting for a subset of users, but unintentionally put our rate limiting configuration into an invalid state.To prevent this from happening again, we have addressed the bug with our rate limiting. We are also adding additional monitors to detect anomalies in our traffic patterns, which will allow us to identify similar issues during future deployments. Furthermore, we are exploring ways to test our rate limit scaling in our internal environment to enhance our pre-production validation process.

1757960485 - 1757960916 Resolved

Repository search is degraded

At around 18:45 UTC on Friday, September 12, 2025, a change was deployed that unintentionally affected search index management. As a result, approximately 25% of repositories were temporarily missing from search results.By 12:45 UTC on Saturday, September 14, most missing repositories were restored from an earlier search index snapshot, and repositories updated between the snapshot and the restoration were reindexed. This backfill was completed at 21:25 UTC.After these repairs, about 98.5% of repositories were once again searchable. We are performing a full reconciliation of the search index and customers can expect to see records being updated and content becoming searchable for all repos again between now and Sept 25.NOTE: Users who notice missing or outdated repositories in search results can force reindexing by starring or un-starring the repository. Other repository actions such as adding topics, or updating the repository description, will also result in reindexing. In general, changes to searchable artifacts in GitHub will also update their respective search index in near-real time.User impact has been mitigated with the exception of the 1.5% of repos that are missing from the search index. The change responsible for the search issue has been reverted, and full reconciliation of the search index is underway, expected to complete by September 23. We have added additional checks to our indexing model to ensure this failure does not happen again. We are also investigating faster repair alternatives.To avoid resource contention and possible further issues we are currently not repairing repositories or organizations individually at this time. No repository data was lost, and other search types were not affected.

1757767440 - 1757970063 Resolved

Incident with Actions

On September 10, 2025 between 13:00 and 14:15 UTC, Actions users experienced failed jobs and run start delays for Ubuntu 24 and Ubuntu 22 jobs on standard runners in private repositories. Additionally, larger runner customers experienced run start delays for runner groups with private networking configured in the eastus2 region. This was due to an outage in an underlying compute service provider in eastus2. 1.06% of Ubuntu 24 jobs and 0.16% of Ubuntu 22 jobs failed during this period. Jobs for larger runners using private networking in the eastus2 region were unable to start for the duration of the incident.We have identified and are working on improvements in our resilience to single partner region outages for standard runners so impact is reduced in similar scenarios in the future.

1757510585 - 1757512961 Resolved
⮜ Previous Next ⮞