Incident History

Degraded Performance for GitHub Actions MacOS Runners

On October 1, 2025 between 07:00 UTC and 17:20 UTC, Mac hosted runner capacity for Actions was degraded, leading to timed out jobs and long queue times. On average, the error rate was 46% and peaked at 96% of requests to the service. XL and Intel runners recovered by 10:10 UTC, with the other types taking longer to recover.The degraded capacity was triggered by a scheduled event at 07:00 UTC that led to a permission failure on Mac runner hosts, blocking reimage operations. The permission issue was resolved by 9:41 UTC, but the recovery of available runners took longer than expected due to a combination of backoff logic slowing backend operations and some hosts needing state resets.We deployed changes immediately following the incident to address the scheduled event and ensure that similar failures will not block critical operations in the future. We are also working to reduce the end-to-end time for self-healing of offline hosts for quicker full recovery of future capacity or host events.

1759305583 - 1759337759 Resolved

Disruption with Gemini 2.5 Pro and Gemini 2.0 Flash in Copilot

On September 29, 2025, between 17:53 and 18:42 UTC, the Copilot service experienced a degradation of the Gemini 2.5 model due to an issue with our upstream provider. Approximately 24% of requests failed, affecting 56% of users during this period. No other models were impacted.GitHub notified the upstream provider of the problem as soon as it was detected. The issue was resolved after the upstream provider rolled back a recent change that caused the disruption. GitHub will continue to enhance our monitoring and alerting systems to reduce the time it takes to detect and mitigate similar issues in the future.

1759171187 - 1759173161 Resolved

Disruption with some GitHub services

On September 29, 2025 between 16:26 UTC and 17:33 UTC the Copilot API experienced a partial degradation causing intermittent erroneous 404 responses for an average of 0.2% of GitHub MCP server requests, peaking at times around 2% of requests. The issue stemmed from an upgrade of an internal dependency which exposed a misconfiguration in the service.We resolved the incident by rolling back the upgrade to address the misconfiguration. We fixed the configuration issue and will improve documentation and rollout process to prevent similar issues.

1759164311 - 1759167231 Resolved

Disruption with some GitHub services

On September 26, 2025 between 16:22 UTC and 18:32 UTC raw file access was degraded for a small set of four repositories. On average, raw file access error rate was 0.01% and peaked at 0.16% of requests. This was due to a caching bug exposed by excessive traffic to a handful of repositories. We mitigated the incident by resetting the state of the cache for raw file access and are working to improve cache usage and testing to prevent issues like this in the future.

1758819625 - 1758821778 Resolved

Disruption with some GitHub services

On September 23, 2025, between 15:29 UTC and 17:38 UTC and also on September 24, 2025 between 15:02 UTC and 15:12, email deliveries were delayed up to 50 minutes which resulted in significant delays for most types of email notifications. This occurred due to an unusually high volume of traffic which caused resource contention on some of our outbound email servers.We have updated the configuration we use to better allocate capacity when there is a high volume of traffic and are also updating our monitors so we can detect this type of issue before it becomes a customer impacting incident.

1758725190 - 1758728169 Resolved

Claude Opus 4 is experiencing degraded performance

On September 24th, 2025, between 08:02 UTC and 09:11 UTC the Copilot service was degraded for Claude Opus 4 and Claude Opus 4.1 requests. On average, 22% of requests failed for Claude Opus 4 and 80% of requests for Claude Opus 4.1. This was due to an upstream provider returning elevated errors on Claude Opus 4 and Opus 4.1.

We mitigated the issue by directing users to select other models and by monitoring recovery. To resolve the issue, we are expanding failover capabilities by integrating with additional infrastructure providers.

1758704891 - 1758705510 Resolved

Incident with Copilot

Between 20:06 UTC September 23 and 04:58 UTC September 24, 2025, the Copilot service experienced degraded availability for Claude Sonnet 4 and 3.7 model requests.During this period, 0.46% of Claude 4 requests and 7.83% of Claude 3.7 requests failed.The reduced availability resulted from Copilot disabling routing to an upstream provider that was experiencing issues and reallocating capacity to other providers to manage requests for Claude Sonnet 3.7 and 4.We are continuing to investigate the source of the issues with this provider and will provide an update as more information becomes available.

1758666141 - 1758673589 Resolved

Incident with Pages and Actions

On September 23, between 17:11 and 17:40 UTC, customers experienced failures and delays when running workflows on GitHub Actions and building or deploying GitHub Pages. The issue was caused by a faulty configuration change that disrupted service to service communication in GitHub Actions. During this period, in-progress jobs were delayed and new jobs would not start due to a failure to acquire runners, and about 30% of all jobs failed. GitHub Pages users were unable to build or deploy their Pages during this period.The offending change was rolled back within 15 minutes of its deployment, after which Actions workflows and Pages deployments began to succeed. Actions customers continued to experience delays for about 15 minutes after the rollback was completed while services worked through the backlog of queued jobs. We are planning to implement additional rollout checks to help detect and prevent similar issues in the future.

1758648491 - 1758649317 Resolved

Disruption with some GitHub services

On September 23, 2025, between 15:29 UTC and 17:38 UTC and also on September 24, 2025 between 14:02 UTC and 15:12 UTC, email deliveries were delayed up to 50 minutes which resulted in significant delays for most types of email notifications. This occurred due to an unusually high volume of traffic which caused resource contention on some of our outbound email servers.We have updated the configuration we use to better allocate capacity when there is a high volume of traffic and are also updating our monitors so we can detect this type of issue before it becomes a customer impacting incident.

1758646013 - 1758649225 Resolved

Incident with Codespaces

On September 17, 2025 between 13:23 and 16:51 UTC some users in West Europe experienced issues with Codespaces that had shut down due to network disconnections and subsequently failed to restart. Codespace creations and resumes were failed over to another region at 15:01 UTC. While many of the impacted instances self-recovered after mitigation efforts, approximately 2,000 codespaces remained stuck in a "shutting down" state while the team evaluated possible methods to recover unpushed data from the latest active session of affected codespaces. Unfortunately, recovery of that data was not possible. We unblocked shutdown of those codespaces, with all instances either shut down or available by 8:26 UTC on September 19.The disconnects were triggered by an exhaustion of resources in the network relay infrastructure in that region, but the lack of self-recovery was caused by an unhandled error impacting the local agent, which led to an unclean shutdown.We are improving the resilience of the local agent to disconnect events to ensure shutdown of codespaces is always clean without data loss. We have also addressed the exhausted resources in the network relay and will be investing in improved detection and resilience to reduce the impact of similar events in the future.

1758121488 - 1758131739 Resolved
⮜ Previous Next ⮞