Incident History

Disruption with Claude Opus 4.7

On June 8, 2026, between 08:40 UTC and 09:30 UTC, the Claude Opus 4.7 model experienced degraded availability with error rates peaking at 8.4% and averaging 1.9%. This was due to an upstream provider issue that caused temporary unavailability and rate limiting on secondary failover systems. Users selecting Auto or alternative models were unaffected. We are improving provider failover mechanisms and monitoring to prevent similar issues.

1780909544 - 1780913012 Resolved

Pull Requests and Issues unavailable for signed-out users

On June 8, 2026, between approximately 06:30 UTC and 08:36 UTC, signed-out users experienced sustained elevated HTTP 504 errors when accessing Pull Requests, Issues, releases, patch diffs, and other related GitHub.com pages. During the incident, approximately 17% of unauthenticated requests to the affected GitHub.com endpoints returned gateway timeout errors, peaking at roughly 34% of requests at around 06:50 UTC. Some GitHub Actions workflows were also affected when they depended on release downloads or related GitHub.com endpoints. The impact lasted approximately two hours and was isolated to unauthenticated traffic; signed-in users were not affected. The issue was caused by a significant increase in abusive traffic to specific GitHub.com endpoints. This degraded our ability to respond to unauthenticated requests, causing requests to queue beyond timeout thresholds and return gateway timeout errors. We mitigated the incident by identifying the anomalous traffic pattern and applying targeted blocks at the load balancer and application layers. Error rates returned to normal and affected services were fully restored by 08:36 UTC. To reduce the likelihood and impact of similar incidents in the future, we are improving automated detection and blocking for these traffic patterns, improving our emergency traffic-blocking deployment path, and evaluating routing changes for endpoints used by both signed-out users and automated workflows.

1780902697 - 1780907782 Resolved

Disruption with some GitHub services in the EU region

On June 6, 2026 between 16:18 UTC and 17:01 UTC, users experienced elevated error rates when performing Git operations (cloning, fetching, downloading archives) and accessing package registries. The issue affected users whose traffic was routed through our European infrastructure.During this time, on average 0.95% of Codeload requests and 9.2% of Package Registry requests failed with server errors. At peak, the Codeload error rate reached 1.76% and Package Registry errors reached 27%.The root cause was a planned network circuit migration that disrupted connectivity at one of our points of presence. Our process for shifting traffic away from the site did not operate as expected, resulting in a small amount of production traffic to continue being serviced at the effected site during the maintenance window. The issue was mitigated by rolling back the network change, restoring normal connectivity. Services fully recovered by 17:01 UTC.To reduce the likelihood of similar incidents in the future, we are reviewing our site drain process to make it more verbose and add visibility so any unexpected behavior is caught earlier.

1780764815 - 1780765666 Resolved

EU Network Maintenance

This incident was used to notify for a maintenance event. There is no specific root cause analysis. Maintenance did run longer than expected (we were complete at 18:48 UTC) but the work proceeded as planned.

1780759893 - 1780771755 Resolved

Auth issue resulting in API impacts, including some Slack and Teams channel subscriptions

On June 5, 2026, between 15:35 UTC and 16:45 UTC, 0.11% of authenticated REST API requests incorrectly returned “not found” responses. Impact was concentrated among - and significantly higher for - users authenticating with user-to-server tokens to access organization-owned repositories.Some users of our GitHub for Slack and GitHub for Microsoft Teams integrations saw their channel subscriptions removed as those systems interpreted the transient "not found" response as durable loss of access. Roughly 12% of organizations with active channel subscriptions were impacted, with ~2% of all channel subscriptions being removed.These issues were triggered by a change to an internal authorization component that did not correctly resolve access for user-to-server tokens against organization-owned repositories. We mitigated the incident by disabling the accompanying feature flag at 16:45 UTC, after which API responses returned to normal. We then restored all impacted Slack and Microsoft Teams channel subscriptions, with restoration completed at 22:21 UTC.We are working to add retry and grace-period logic in the chat integrations so transient errors no longer trigger subscription deletions. In parallel, we are improving observability and gating of authorization changes so downstream impact is detected during scoped, gradual rollouts.

1780680031 - 1780698069 Resolved

Live updates degraded

Everything is operating normally.

1780604439 - 1780605127 Resolved

Copilot Code Review Failing

On June 4, 2026, from 17:30 UTC to 18:55 UTC, Copilot Code Review experienced elevated failures for review requests on GitHub.com. Affected users saw “Copilot ran into an error” on pull requests when requesting a code review.During the incident window, an average of 81.6% of Copilot Code Review requests failed, with a peak failure rate of 93.9%. Approximately 36,800 code review requests failed. GitHub Enterprise Cloud with data residency was not impacted.The issue was caused by a newly released dependency used by the Copilot Code Review processing workflow. The release introduced an incompatibility with the runtime environment. Because the workflow automatically consumed the latest release, the incompatible version was picked up without sufficient compatibility validation and caused review processing to fail.We mitigated the incident by removing the problematic dependency version and redeploying the affected processing service. New code reviews began recovering at 18:44 UTC, and the failure rate returned to baseline by 18:55 UTC. Remaining timed-out work drained by 19:59 UTC.To reduce the risk of recurrence, we are pinning the dependency version instead of automatically consuming the latest release, adding compatibility checks for future releases, improving fast-failure behavior when the review processor cannot start, adding shorter timeout controls for review workflows, and improving monitoring for review completion failures.

1780596174 - 1780603167 Resolved

Disruption with some GitHub services

Between June 1, 2026, 23:00 UTC and June 4, 2026 04:11 UTC, customers experienced delays in Dependabot scheduled version updates. Pull request creation for version updates was delayed, with delays increasing over time and reaching up to two days. Approximately 1.5 million repositories with active Dependabot version update configurations were affected. Dependabot security updates were not affected. The primary cause was changes to an internal platform service that routes requests for Dependabot and other services. We mitigated the incident by deploying a fix that enables batch enqueuing of update jobs, which significantly increased processing throughput. Once the backlog was drained, Dependabot returned to normal processing times. To reduce the risk of recurrence, we are working on tuning batch size and concurrency limits for Dependabot update job processing. We are also adding monitoring for job processing lag to enable earlier detection and faster mitigation of similar issues.

1780515776 - 1780546319 Resolved

Disruption with some GitHub services

On June 2, 2026, between 21:54 UTC and June 3, 2026 06:45 UTC, the Spark service was degraded and users were unable to store or retrieve data for their Spark apps in one of our hosting regions. Users could still make changes to their app configuration during this time. The error rate peaked at 25% of affected requests to the service. Impact was limited to users whose requests were served through a single affected region; 43 users experienced errors during this window.The root cause was a configuration that referenced a service component by a fixed address rather than a dynamic service endpoint. When the component was replaced, requests could no longer reach the fixed address and began to fail. We resolved the incident by updating the configuration to use a our standard service endpoints that are resilient to component replacement. Recovery time was extended because replacing the component required overrides to a temporary deployment safeguard.We are working to add validation that prevents fixed infrastructure addresses from being used in application configuration outside of test environments and to improve our monitoring to reduce our time to detect.

1780456408 - 1780469219 Resolved

Delays with Code Scanning and Billing

Starting from 13:00 UTC June 1, 2026, to 00:17 UTC June 2, 2026, multiple services experienced delayed job processing due to increased latency in our background job queue service. The root cause was insufficient queue processing capacity to handle a large week-over-week increase in total job traffic.Users saw up to 90 minutes of delay in billing usage updates, 30 minutes of delay for webhook notifications to show, and 15 minutes of delay to see email notifications. Mitigation involved scaling up our background job service capacity to handle the spike in job traffic.We have added queue capacity monitoring to our background job queue service to stay ahead of weekly growth patterns and to reduce time to detect in the future.

1780327029 - 1780359478 Resolved
⮜ Previous Next ⮞