Incident History

Incident With Actions

Due to a degradation of one instance of our internal message delivery service, a percentage of jobs started between 06/30/2025 19:18 UTC and 06/30/2025 19:50 UTC failed, and are no longer retry-able. Runners assigned to these jobs will automatically recover within 24 hours, but deleting and recreating the runner will free up the runner immediately.

1751329311 - 1751329311 Resolved

Disruption with Claude 3.7 Sonnet in Copilot Chat

This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

1751310836 - 1751313336 Resolved

Disruption with some GitHub services

On June 26, 2025, between 17:10 UTC and 23:30 UTC, around 40% of attempts to create a repository from a template repository failed. The failures were an unexpected result of a gap in testing and observability.We mitigated the incident by rolling back the deployment.We are working to improve our testing and automatic detection of errors associated with failed template repository creation.

1750979116 - 1750980783 Resolved

GitHub Enterprise Importer delays

On June 26th, between 14:42UTC and 18:05UTC, the GitHub Enterprise Importer (GEI) service was in a degraded state, during which time, customers of the service experienced extended repository migration durations.Our investigation found that the combined effect of several database updates resulted in the severe throttling of GEI to preserve overall database health.We have taken steps to prevent additional impact and are working to implement additional safeguards to prevent similar incidents from occurring in the future.

1750948968 - 1750961111 Resolved

Repository Navigation Bar Missing in GitHub Enterprise Cloud

This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

1750762500 - 1750767964 Resolved

Disruption with the GitHub mobile android application

Between June 19th, 2025 11:35 UTC and June 20th, 2025 11:20 UTC the GitHub Mobile Android application was unable to login new users. The iOS app was unaffected.This was due to a new GitHub App feature being tested internally, which was inadvertently enforced for all GitHub-owned applications, including GitHub Mobile.A mismatch in client and server expectations due to this feature caused logins to fail. We mitigated the incident by disabling the feature flag controlling the feature.We are working to improve our time to detection and put in place stronger guardrails that reduce impact from internal testing on applications used by all customers.

1750416545 - 1750418430 Resolved

Disruption with some GitHub services

On June 18, 2025 between 22:20 UTC and 23:00 UTC the Claude Sonnet 3.7 and Claude Sonnet 4 models for GitHub Copilot Chat experienced degraded performance. During the impact, some users would receive an immediate error when making a request to a Claude model. This was due to upstream errors with one of our model providers, which have since been resolved. We mitigated the impact by disabling the affected provider endpoints to reduce user impact, redirecting Claude Sonnet requests to additional partners.We are working to update our incident response playbooks for infrastructure provider outages and improve our monitoring and alerting systems to reduce our time to detection and mitigation of issues like this one in the future.

1750286393 - 1750288431 Resolved

Partial Actions Cache degradation

On June 18, 2025, between 08:21 UTC and 18:47 UTC, some Actions jobs experienced intermittent failures downloading from the Actions Cache service. During the incident, 17% of workflow runs experienced cache download failures, resulting in a warning message in the logs and performance degradation. The disruption was caused by a network issue in our database systems that led to a database replica getting out of sync with the primary. We mitigated the incident by routing cache download url requests to bypass the out-of-sync replica until it was fully restored.To prevent this class of incidents, we are developing capability in our database system to more robustly bypass out-of-sync replicas. We are also implementing improved monitoring to help us detect similar issues more quickly going forward.

1750265185 - 1750272465 Resolved

Partial Degradation in Issues Experience

On June 18, 2025, between 15:15 UTC and 19:29 UTC, the Issues service was degraded, and certain GraphQL queries accessing the ReactionGroup.reactors field returned errors. Our query routing infrastructure was impacted by exceptions from a particular database migration, resulting in errors for an average of 0.0097% of overall GraphQL requests (peaking at 0.02%).We mitigated the incident by reverting the migration.We continue to investigate the cause of the exceptions and are holding off on similar migrations until the underlying issue is understood and resolved.

1750263660 - 1750268538 Resolved

Incident with multiple GitHub services

On June 17, 2025, between 19:32 UTC and 20:03 UTC, an internal routing policy deployment to a subset of network devices caused reachability issues for certain network address blocks within our datacenters. Authenticated users of the github.com UI experienced 3-4% error rates for the duration. Authenticated callers of the API experienced 40% error rates. Unauthenticated requests to the UI and API experienced nearly 100% error rates for the duration. Actions service experienced 2.5% of runs being delayed for an average of 8 minutes and 3% of runs failing. Large File Storage (LFS) requests experienced 0.978% errors. At 19:54 UTC, the deployment was rolled back, and network availability for the affected systems was restored. At 20:03 UTC, we fully restored normal operations. To prevent similar issues, we are expanding our validation process for routing policy changes.

1750189376 - 1750191770 Resolved
⮜ Previous