Incident History

Degraded performance in merge queue

This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

1771439138 - 1771442416 Resolved

Intermittent authentication failures on GitHub

On February 17, 2026, between 17:07 UTC and 19:06 UTC, some customers experienced intermittent authentication failures affecting GitHub Actions, parts of Git operations, and other authentication-dependent requests. On average, the Actions error rate was approximately 0.6% of affected API requests. Git operations ssh read error rate was approximately 0.29%, while ssh write and http operations were not impacted. During the incident, a subset of requests failed due to token verification lookups intermittently failing, leading to 401 errors and degraded reliability for impacted workflows.The issue was caused by elevated replication lag in the token verification database cluster. In the days leading up to the incident, the token store’s write volume grew enough to exceed the cluster’s available capacity. Under peak load, older replica hosts were unable to keep up, replica lag increased, and some token lookups became inconsistent, resulting in intermittent authentication failures.We mitigated the incident by adjusting the database replica topology to route reads away from lagging replicas and by adding/bringing additional replica capacity online. Service health improved progressively after the change, with GitHub Actions recovering by ~19:00 UTC and the incident resolved at 19:06 UTC.We are working to prevent recurrence by improving the resilience and scalability of our underlying token verification data stores to better handle continued growth.

1771350372 - 1771355184 Resolved

Disruption with some GitHub services regarding file upload

On February 13, 2026, between 21:46 UTC and 22:58 UTC (72 minutes), the GitHub file upload service was degraded and users uploading from a web browser on GitHub.com were unable to upload files to repositories, create release assets, or upload manifest files. During the incident, successful upload completions dropped by ~85% from baseline levels. This was due to a code change that inadvertently modified browser request behavior and violated CORS (Cross-Origin Resource Sharing) policy requirements, causing upload requests to be blocked before reaching the upload service.We mitigated the incident by reverting the code change that introduced the issue.We are working to improve automated testing for browser-side request changes and to add monitoring/automated safeguards for upload flows to reduce our time to detection and mitigation of similar issues in the future.

1771021840 - 1771023522 Resolved

Disruption with some GitHub services

Between February 11th 21:30 UTC and February 12th 15:40 UTC, users in Western Europe experienced degraded quality for all Next Edit Suggestions requests. Additionally, on February 12th, between 18:40 UTC and 20:30 UTC, users in Australia and South America experienced degraded quality and increased latency of up to 500ms for all Next Edit Suggestions requests. The root cause was a newly introduced regression in an upstream service dependency. The incident was mitigated by failing over Next Edit Suggestions traffic to unaffected regions, which caused the increased latency. Once the regression was identified and rolled back, we restored the impacted capacity. We have improved our quality analysis tooling and are working on more robust quality impact alerting to accelerate detection of these issues in the future.

1770921369 - 1770928495 Resolved

Intermittent disruption with Copilot completions and inline suggestions

Between February 11th 21:30 UTC and February 12th 15:40 UTC, users in Western Europe experienced degraded quality for all Next Edit Suggestions requests. Additionally, on February 12th, between 18:40 UTC and 20:30 UTC, users in Australia and South America experienced degraded quality and increased latency of up to 500ms for all Next Edit Suggestions requests. The root cause was a newly introduced regression in an upstream service dependency.The incident was mitigated by failing over Next Edit Suggestions traffic to unaffected regions, which caused the increased latency. Once the regression was identified and rolled back, we restored the impacted capacity. We have improved our quality analysis tooling and are working on more robust quality impact alerting to accelerate detection of these issues in the future.

1770905208 - 1770915001 Resolved

Disruption with some GitHub services

From Feb 12, 2026 09:16:00 UTC to Feb 12, 2026 11:01 UTC, users attempting to download repository archives (tar.gz/zip) that include Git LFS objects received errors. Standard repository archives without LFS objects were not affected. On average, the archive download error rate was 0.0042% and peaked at 0.0339% of requests to the service. This was caused by deploying a corrupt configuration bundle, resulting in missing data used for network interface connections by the service.We mitigated the incident by applying the correct configuration to each site. We have added checks for corruption in this deployment, and will add auto-rollback detection for this service to prevent issues like this in the future.

1770892715 - 1770894736 Resolved

Incident with Codespaces

On February 12, 2026, between 00:51 UTC and 09:35 UTC, users attempting to create or resume Codespaces experienced elevated failure rates across Europe, Asia and Australia, peaking at a 90% failure rate.The disconnects were triggered by a bad configuration rollout in a core networking dependency, which led to internal resource provisioning failures. We are working to improve our alerting thresholds to catch issues before they impact customers and strengthening rollout safeguards to prevent similar incidents.

1770882838 - 1770890166 Resolved

Disruption with some GitHub services

On February 11 between 16:37 UTC and 00:59 UTC the following day, 4.7% of workflows running on GitHub Larger Hosted Runners were delayed by an average of 37 minutes. Standard Hosted and self-hosted runners were not impacted. This incident was caused by capacity degradation in Central US for Larger Hosted Runners. Workloads not pinned to that region were picked up by other regions, but were delayed as those regions became saturated. Workloads configured with private networking in that region were delayed until compute capacity in that region recovered. The issue was mitigated by rebalancing capacity across internal and external workloads and general increases in capacity in affected regions to speed recovery. In addition to working with our compute partners on the core capacity degradation, we are working to ensure other regions are better able to absorb load with less delay to customer workloads. For pinned workflows using private networking, we are shipping support soon for customers to failover if private networking is configured in a paired region.

1770836290 - 1770857963 Resolved

Incident with API Requests

On February 11, 2026, between 13:51 UTC and 17:03 UTC, the GraphQL API experienced degraded performance due to elevated resource utilization. This resulted in incoming client requests waiting longer than normal, timing out in certain cases. During the impact window, approximately 0.65% of GraphQL requests experienced these issues, peaking at 1.06%. The increased load was due to an increase in query patterns that drove higher than expected resource utilization of the GraphQL API. We mitigated the incident by scaling out resource capacity and limiting the capacity available to these query patterns. We're improving our telemetry to identify slow usage growth and changes in GraphQL workloads. We’ve also added capacity safeguards to prevent similar incidents in the future.

1770823613 - 1770830103 Resolved

Incident with Copilot

On February 11, 2025, between 14:30 UTC and 15:30 UTC, the Copilot service experienced degraded availability for requests to Claude Haiku 4.5. During this time, on average 10% of requests failed with 23% of sessions impacted. The issue was caused by an upstream problem from multiple external model providers that affected our ability to serve requests. The incident was mitigated once one of the providers resolved the issue and we rerouted capacity fully to that provider. We have improved our telemetry to improve incident observability and implemented an automated retry mechanism for requests to this model to mitigate similar future upstream incidents.

1770823609 - 1770824800 Resolved
⮜ Previous Next ⮞