On July 2, 2025, between 1:35 AM UTC and 16:23 UTC, the GitHub Enterprise Importer (GEI) migration service experienced degraded performance and slower-than-normal migration queue processing times. This incident was triggered due to a migration including an abnormally large number of repositories, overwhelming the queue and slowing processing for all migrations.We mitigated the incident by removing the problematic migrations from the queue. Service was restored to normal operation as the queue volume was reduced.To ensure system stability, we have introduced additional concurrency controls that limit the number of queued repositories per organization migration, helping to prevent similar incidents in the future.
On July 2nd, 2025, between approximately 08:40 and 10:16 UTC, the Copilot service experienced degradation due to an infrastructure issue which impacted the Claude Sonnet 4 model, leading to a spike in errors. No other models were impacted.The issue was mitigated by rebalancing load within our infrastructure. GitHub is working to further improve the resiliency of the service to prevent similar incidents in the future.
Due to a degradation of one instance of our internal message delivery service, a percentage of jobs started between 06/30/2025 19:18 UTC and 06/30/2025 19:50 UTC failed, and are no longer retry-able. Runners assigned to these jobs will automatically recover within 24 hours, but deleting and recreating the runner will free up the runner immediately.
On June 30th, 2025, between approximately 18:20 and 19:55 UTC, the Copilot service experienced a degradation of the Claude Sonnet 3.7 model due to an issue with our upstream provider. Users encountered elevated error rates when using Claude Sonnet 3.7. No other models were impacted.The issue was resolved by a mitigation put in place by our provider. GitHub is working with our provider to further improve the resiliency of the service to prevent similar incidents in the future.
On June 26, 2025, between 17:10 UTC and 23:30 UTC, around 40% of attempts to create a repository from a template repository failed. The failures were an unexpected result of a gap in testing and observability.We mitigated the incident by rolling back the deployment.We are working to improve our testing and automatic detection of errors associated with failed template repository creation.
On June 26th, between 14:42UTC and 18:05UTC, the GitHub Enterprise Importer (GEI) service was in a degraded state, during which time, customers of the service experienced extended repository migration durations.Our investigation found that the combined effect of several database updates resulted in the severe throttling of GEI to preserve overall database health.We have taken steps to prevent additional impact and are working to implement additional safeguards to prevent similar incidents from occurring in the future.
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
Between June 19th, 2025 11:35 UTC and June 20th, 2025 11:20 UTC the GitHub Mobile Android application was unable to login new users. The iOS app was unaffected.This was due to a new GitHub App feature being tested internally, which was inadvertently enforced for all GitHub-owned applications, including GitHub Mobile.A mismatch in client and server expectations due to this feature caused logins to fail. We mitigated the incident by disabling the feature flag controlling the feature.We are working to improve our time to detection and put in place stronger guardrails that reduce impact from internal testing on applications used by all customers.
On June 18, 2025 between 22:20 UTC and 23:00 UTC the Claude Sonnet 3.7 and Claude Sonnet 4 models for GitHub Copilot Chat experienced degraded performance. During the impact, some users would receive an immediate error when making a request to a Claude model. This was due to upstream errors with one of our model providers, which have since been resolved. We mitigated the impact by disabling the affected provider endpoints to reduce user impact, redirecting Claude Sonnet requests to additional partners.We are working to update our incident response playbooks for infrastructure provider outages and improve our monitoring and alerting systems to reduce our time to detection and mitigation of issues like this one in the future.
On June 18, 2025, between 08:21 UTC and 18:47 UTC, some Actions jobs experienced intermittent failures downloading from the Actions Cache service. During the incident, 17% of workflow runs experienced cache download failures, resulting in a warning message in the logs and performance degradation. The disruption was caused by a network issue in our database systems that led to a database replica getting out of sync with the primary. We mitigated the incident by routing cache download url requests to bypass the out-of-sync replica until it was fully restored.To prevent this class of incidents, we are developing capability in our database system to more robustly bypass out-of-sync replicas. We are also implementing improved monitoring to help us detect similar issues more quickly going forward.