Incident History

Repository Navigation Bar Missing in GitHub Enterprise Cloud

This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

Disruption with the GitHub mobile android application

Between June 19th, 2025 11:35 UTC and June 20th, 2025 11:20 UTC the GitHub Mobile Android application was unable to login new users. The iOS app was unaffected.This was due to a new GitHub App feature being tested internally, which was inadvertently enforced for all GitHub-owned applications, including GitHub Mobile.A mismatch in client and server expectations due to this feature caused logins to fail. We mitigated the incident by disabling the feature flag controlling the feature.We are working to improve our time to detection and put in place stronger guardrails that reduce impact from internal testing on applications used by all customers.

Disruption with some GitHub services

On June 18, 2025 between 22:20 UTC and 23:00 UTC the Claude Sonnet 3.7 and Claude Sonnet 4 models for GitHub Copilot Chat experienced degraded performance. During the impact, some users would receive an immediate error when making a request to a Claude model. This was due to upstream errors with one of our model providers, which have since been resolved. We mitigated the impact by disabling the affected provider endpoints to reduce user impact, redirecting Claude Sonnet requests to additional partners.We are working to update our incident response playbooks for infrastructure provider outages and improve our monitoring and alerting systems to reduce our time to detection and mitigation of issues like this one in the future.

Partial Actions Cache degradation

On June 18, 2025, between 08:21 UTC and 18:47 UTC, some Actions jobs experienced intermittent failures downloading from the Actions Cache service. During the incident, 17% of workflow runs experienced cache download failures, resulting in a warning message in the logs and performance degradation. The disruption was caused by a network issue in our database systems that led to a database replica getting out of sync with the primary. We mitigated the incident by routing cache download url requests to bypass the out-of-sync replica until it was fully restored.To prevent this class of incidents, we are developing capability in our database system to more robustly bypass out-of-sync replicas. We are also implementing improved monitoring to help us detect similar issues more quickly going forward.

Partial Degradation in Issues Experience

On June 18, 2025, between 15:15 UTC and 19:29 UTC, the Issues service was degraded, and certain GraphQL queries accessing the ReactionGroup.reactors field returned errors. Our query routing infrastructure was impacted by exceptions from a particular database migration, resulting in errors for an average of 0.0097% of overall GraphQL requests (peaking at 0.02%).We mitigated the incident by reverting the migration.We continue to investigate the cause of the exceptions and are holding off on similar migrations until the underlying issue is understood and resolved.

Incident with multiple GitHub services

On June 17, 2025, between 19:32 UTC and 20:03 UTC, an internal routing policy deployment to a subset of network devices caused reachability issues for certain network address blocks within our datacenters. Authenticated users of the github.com UI experienced 3-4% error rates for the duration. Authenticated callers of the API experienced 40% error rates. Unauthenticated requests to the UI and API experienced nearly 100% error rates for the duration. Actions service experienced 2.5% of runs being delayed for an average of 8 minutes and 3% of runs failing. Large File Storage (LFS) requests experienced 0.978% errors. At 19:54 UTC, the deployment was rolled back, and network availability for the affected systems was restored. At 20:03 UTC, we fully restored normal operations. To prevent similar issues, we are expanding our validation process for routing policy changes.

Incident with Actions

Multiple services critical to GitHub's attestation infrastructure experienced an outage which prevented Fulcio from issuing signing certificates. During the outage, GitHub customers who use the "actions/attest-build-provenance" action from public repositories were not able to generate attestations.

Some Copilot chat models are failing requests

Incident With Copilot

On June 6, 2025, an update to mitigate a previous incident led to automated scaling of database infrastructure used by Copilot Coding Agent. The clients of the service were not implemented to automatically handle an extra partition. Hence it was unable to retrieve data across partitions, resulting in unexpected 404 errors.

As a result, approximately 17% of coding sessions displayed an incorrect final state - such as sessions appearing in-progress when they were actually completed. Additionally, some Copilot-authored pull requests were missing timeline events indicating task completion. Importantly, this did not affect Copilot Coding Agent’s ability to finish code tasks and submit pull requests.

To prevent similar issues in the future we are taking steps to improve our systems and monitoring.

Disruption with some GitHub services

Between 2025-06-10 12:25 UTC and 2025-06-11 01:51 UTC, GitHub Enterprise Cloud (GHEC) customers with approximately 10,000 or more users, saw performance degradation and 5xx errors when loading the Enterprise Settings’ People management page. Less than 2% of page requests resulted in an error. The issue was caused by a database change that replaced an index required for the page load. The issue was resolved by reverting the database change.To prevent similar incidents, we are improving the testing and validation process for replacing database indexes.

⮜ Previous Next ⮞