Incident History

Incident with Actions

This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

1757510585 - 1757512961 Resolved

Degraded REST API success rates for some customers

On September 4, 2025 between 15:30 UTC and 20:00 UTC the REST API endpoints git/refs, git/refs/, and git/matching-refs/ were degraded and returned elevated errors for repositories with reference counts over 22k. On average, the request error rate to these specific endpoints was 0.5%. Overall REST API availability remained 99.9999%. This was due to the introduction of a code change that added latency to reference evaluations and overly affected repositories with many branches, tags, or other references.We mitigated the incident by reverting the new code.We are working to improve performance testing and to reduce our time to detection and mitigation of issues like this one in the future.

1757009812 - 1757017547 Resolved

Loading avatars might fail for a 0.5% of total users and 100% users around the Arabian Peninsula. We are investigating.

Between August 21, 2025 at 15:00 UTC, and September 2, 2025 at 15:22 UTC the avatars.githubusercontent.com image service was degraded and failed to display user avatars for users in the Middle East. During this time, avatar images appeared broken on github.com for affected users. On average, this impacted about 82% of users routed through one of our Middle East-based points-of-presence, which represents about 0.14% of global users.This was due to a configuration change within GitHub's edge infrastructure in the affected region, causing HTTP requests to fail. As a result, image requests returned HTTP 503 errors. The failure to detect the issues was the result of an alerting threshold set too low.We mitigated the incident by removing the affected site from service, which restored avatar serving for impacted users.To prevent this from recurring, we have tuned configuration defaults for graceful degradation. We also added new health checks to automatically shift traffic from impacted sites. We are updating our monitoring to prevent undetected errors like this in the future.

1756826228 - 1756827881 Resolved

Disruption with some GitHub services

On August 27, 2025 between 20:35 and 21:17 UTC, Copilot, Web and REST API traffic experienced degraded performance. Copilot saw an average of 36% of requests fail with a peak failure rate of 77%. Approximately 2% of all non-Copilot Web and REST API traffic requests failed.

This incident occurred after we initiated a production database migration to drop a column from a table backing copilot functionality. While the column was no longer in direct use, our ORM continued to reference the dropped column. This led to a large number of 5xx responses and was similar to the incident on August 5th. At 21:15 UTC, we applied a fix to the production schema and by 21:17 UTC, all services had fully recovered.

While repairs were in progress to avoid this situation, they were not completed quickly enough to prevent a second incident. We have now implemented a temporary block for all drop column operations as an immediate solution while we add more safeguards to prevent similar issues from occurring in the future. We are also implementing graceful degradation so that Copilot issues will not impact other features of our product.

1756327299 - 1756330078 Resolved

Incident with Actions

On August 21, 2025, from approximately 15:37 UTC to 18:10 UTC, customers experienced increased delays and failures when starting jobs on GitHub Actions using standard hosted runners. This was caused by connectivity issues in our East US region, which prevented runners from retrieving jobs and sending progress updates. As a result, capacity was significantly reduced, especially for busier configurations, leading to queuing and service interruptions. Approximately 8.05% of jobs on public standard Ubuntu24 runners and 3.4% of jobs on private standard Ubuntu24 runners did not start as expected.By 18:10 UTC, we had mitigated the issue by provisioning additional resources in the affected region and burning down the backlog of queued runner assignments. By the end of that day, we deployed changes to improve runner connectivity resilience and graceful degradation in similar situations. We are also taking further steps to improve system resiliency by enhancing observability of network connection health with runners and improving load distribution and failover handling to help prevent similar issues in the future.

1755791674 - 1755799982 Resolved

Incident with Issues and Git Operations

On August 21st, 2025, between 6:15am UTC and 6:25am UTC Git and Web operations were degraded and saw intermittent errors. On average, the error rate was 1% for API and Web requests. This was due to database infrastructure automated maintenance reducing capacity below our tolerated threshold.The incident was resolved when the impacted infrastructure self-healed and returned to normal operating capacity.We are adding guardrails to reduce the impact of this type of maintenance in the future.

1755757533 - 1755759513 Resolved

Disruption with some GitHub services

Between 15:49 and 16:37 UTC on 20 Aug 2025, creating a new GitHub account via the web signup page consistently returned server errors, and users were unable to complete signup during this 48-minute window. We detected the issue at 16:04 UTC and restored normal signup functionality by 16:37 UTC. A recent change to signup flow logic caused all attempts to error. The change was rolled back to restore service. This exposed a gap in our test coverage that we are fixing.

1755706497 - 1755707836 Resolved

Disruption with some GitHub services

On August 19, 2025, between 13:35 UTC and 14:33 UTC, GitHub search was in a degraded state. When searching for pull requests, issues, and workflow runs, users would have seen some slow, empty or incomplete results. In some cases, pull requests failed to load.The incident was triggered by intermittent connectivity issues between our load balancers and search hosts. While retry logic initially masked these problems, retry queues eventually overwhelmed the load balancers, causing failure. The incident was mitigated at 14:33 UTC by throttling our search index pipeline. Our automated alerting and internal retries reduced the impact of this event significantly. As a result of this incident we believe we have identified a faster way to mitigate it in the future. We are also working on multiple solutions to resolve the underlying connectivity issues.

1755610793 - 1755614818 Resolved

Incident with Packages

On August 14, 2025, between 17:50 UTC and 18:08 UTC, the Packages NPM Registry service was degraded. During this period, NPM package uploads were unavailable and approximately 50% of download requests failed. We identified the root cause as a sudden spike in Packages publishing activity that exceeded our service capacity limits. We are implementing better guardrails to protect the service against unexpected traffic surges and improving our incident response runbooks to ensure faster mitigation of similar issues.

1755194795 - 1755196632 Resolved

Incident with Actions

On August 14, 2025, between 02:30 UTC and 06:14 UTC, GitHub Actions was degraded. On average, 3% of workflow runs were delayed by at least 5 minutes. The incident was caused by an outage in a downstream dependency that led to failures in backend service connectivity in one region. At 03:59 UTC, we evacuated a majority of services in the impacted region, but some users may have seen ongoing impact until all services were fully evacuated at 06:14 UTC. We are working to improve monitoring and processes of failover to reduce our time to detection and mitigation of issues like this one in the future.

1755147838 - 1755152589 Resolved
⮜ Previous