Between December 9th, 2025 21:07 UTC and December 10th, 2025 14:52 UTC, 177 macos-14-large jobs were run on an Ubuntu larger runner VM instead of MacOS runner VMs. The impacted jobs were routed to a larger runner with incorrect metadata. We mitigated this by deleting the runner.The routing configuration is not something controlled externally. A manual override was done previously for internal testing, but left incorrect metadata for a large runner instance. An infrastructure migration caused this misconfigured runner to come online which started the incorrect assignments. We are removing the ability to manually override this configuration entirely, and are adding alerting to identify possible OS mismatches for hosted runner jobs.As a reminder, hosted runner VMs are secure and ephemeral, with every VM reimaged after every single job. All jobs impacted here were originally targeted at a GitHub-owned VM image and were run on a GitHub-owned VM image.
On December 10, 2025 between 08:50 UTC and 11:00 UTC, some GitHub Actions workflow runs experienced longer-than-normal wait times for jobs starting or completing. All jobs successfully completed despite the delays. At peak impact, approximately 8% of workflow runs were affected.During this incident, some nodes received a spike in workflow events that led to queuing of event processing. Because runs are pinned to nodes, runs being processed by these nodes saw delays in starting or showing as completed. The team was alerted to this at 8:58 UTC. Impacted nodes were disabled from processing new jobs to allow queues to drain.We have increased overall processing capacity and are implementing safeguards to better balance load across all nodes when spikes occur. This is important to ensure our available capacity can always be fully utilized.
On December 8, 2025, between 21:15 and 22:24 UTC, Copilot code completions experienced a significant service degradation. During this period, up to 65% of code completion requests failed.The root cause was an internal feature flag that caused the primary model supporting Copilot code completions to appear unavailable to the backend service. The issue was resolved once the flag was disabled.To prevent recurrence, we expanded test coverage for Copilot code completion models and are strengthening our detection mechanisms to better identify and respond to traffic anomalies.
On November 26th, 2025, between approximately 02:24 UTC and December 8th, 2025 at 20:26 UTC, enterprise administrators experienced a disruption when viewing agent session activities in the Enterprise AI Controls page. During this period, users were unable to list agent session activity in the AI Controls view. This did not impact viewing agent session activity in audit logs or directly navigating to individual agent session logs, or otherwise managing AI Agents.The issue was caused by a misconfiguration in a change deployed on November 25th that unintentionally prevented data from being published to an internal Kafka topic responsible for feeding the AI Controls page with agent session activity information.The problem was identified and mitigated on December 8th by correcting the configuration issue. GitHub is improving monitoring for data pipeline dependencies and enhancing pre-deployment validation to catch configuration issues before they reach production.
On December 5th, 2025, between 12:00 pm UTC and 9:00 pm UTC, our Team Synchronization service experienced a significant degradation, preventing over 209,000 organization teams from syncing their identity provider (IdP) groups. The incident was triggered by a buildup of synchronization requests, resulting in elevated Redis key usage and high CPU consumption on the underlying Redis cluster.To mitigate further impact, we proactively paused all team synchronization requests between 3:00 pm UTC and 8:15 pm UTC, allowing us to stabilize the Redis cluster. Our engineering team also resolved the issue by flushing the affected Redis keys and queues, which promptly stopped runaway growth and restored service health. Additionally, we scaled up our infrastructure resources to improve our ability to process the high volume of synchronization requests. All pending team synchronizations were successfully processed following service restoration.We are working to strengthen the Team Synchronization service by implementing a killswitch, adding throttling to prevent excessive enqueueing of synchronization requests, and improving the scheduler to avoid duplicate job requests. Additionally, we’re investing in better observability to alert when job drops occur. These efforts are focused on preventing similar incidents and improving overall reliability going forward.
On November 28th, 2025, between approximately 05:51 and 08:04 UTC, Copilot experienced an outage affecting the Claude Sonnet 4.5 model. Users attempting to use this model received an HTTP 400 error, resulting in 4.6% of total chat requests during this timeframe failing. Other models were not impacted.The issue was caused by a misconfiguration deployed to an internal service which made Claude Sonnet 4.5 unavailable. The problem was identified and mitigated by reverting the change. GitHub is working to improve cross-service deploy safeguards and monitoring to prevent similar incidents in the future.
On November 24, 2025, between 12:15 and 15:04 UTC, Codespaces users encountered connection issues when attempting to create a codespace after choosing the recently released VS Code Codespaces extension, version 1.18.1. Users were able to downgrade to the 1.18.0 version of the extension during this period to work around this issue. At peak, the error rate was 19% of connection requests. This was caused by mismatching version dependencies for the released VS Code Codespaces extension.The connection issues were mitigated by releasing the VS Code Codespaces extension version 1.18.2 that addressed the issue. Users utilizing version 1.18.1 of the VS Code Codespaces extension are advised to upgrade to version >=1.18.2.We are improving our validation and release process for this extension to ensure functional issues like this are caught before release to customers and to reduce detection and mitigation times for extension issues like this in the future.
Between November 20, 2025 17:16 UTC to November, 2025 19:08 UTC some users experienced delayed or failed Git Operations for raw file downloads. On average, the error rate was less than 0.2%. This was due to a sustained increase in unauthenticated repository traffic.We mitigated the incident by applying regional rate limiting and are taking steps to improve our monitoring and time to mitigation for similar issues in the future.
On November 19, between 17:36 UTC and 18:04 UTC, GitHub Actions service experienced degraded performance that caused excessive latency in queueing and updating workflow runs and job statuses. Operations related to artifacts, cache, job steps and logs also had significantly increased latency. At peak, 67% of workflow jobs queued during that timeframe were impacted, and the median latency for impacted operations increased by up to 35x.This was caused by a significant change in load pattern on Actions Cache-related operations, leading to a saturated shared resource on the backend. The impact was mitigated by mitigating the new load pattern.To reduce the likelihood of a recurrence, we are improving rate-limiting measures in this area to ensure a more consistent experience for all customers. We are also evaluating changes to reduce the scope of impact.
Between November 19th, 16:13UTC and November 21st, 12:22UTC, the GitHub Enterprise Importer (GEI) service was in a degraded state, during which time, customers of the service experienced a delay when reclaiming mannequins post-migration.We have taken steps to prevent similar incidents from occurring in the future.