Incident History

Delays in UI updates for Actions Runs

On February 3, 2026, between 14:00 UTC and 17:40 UTC, customers experienced delays in Webhook delivery for push events and delayed GitHub Actions workflow runs. During this window, Webhook deliveries for push events were delayed by up to 40 minutes, with an average delay of 10 minutes. GitHub Actions workflows triggered by push events experienced similar job start delays. Additionally, between 15:25 UTC and 16:05 UTC, all GitHub Actions workflow runs experienced status update delays of up to 11 minutes, with a median delay of 6 minutes.The issue stemmed from connection churn in our eventing service, which caused CPU saturation and delays for reads and writes, with subsequent downstream delivery delays for Actions and Webhooks. We have added observability tooling and metrics to accelerate detection, and are correcting stream processing client configuration to prevent recurrence.

Incident with Copilot

On February 3, 2026, between 09:35 UTC and 10:15 UTC, GitHub Copilot experienced elevated error rates, with an average of 4% of requests failing.This was caused by a capacity imbalance that led to resource exhaustion on backend services. The incident was resolved by infrastructure rebalancing, and we subsequently deployed additional capacity.We are improving observability to detect capacity imbalances earlier and enhancing our infrastructure to better handle traffic spikes.

Incident with Codespaces

On February 2, 2026, GitHub Codespaces were unavailable between 18:55 and 22:20 UTC and degraded until the service fully recovered at February 3, 2026 00:15 UTC. During this time, Codespaces creation and resume operations failed in all regions. This outage was caused by a backend storage access policy change in our underlying compute provider that blocked access to critical VM metadata, causing all VM create, delete, reimage, and other operations to fail. More information is available at https://azure.status.microsoft/en-us/status/history/?trackingId=FNJ8-VQZ. This was mitigated by rolling back the policy change, which started at 22:15 UTC. As VMs came back online, our runners worked through the backlog of requests that hadn’t timed out. We are working with our compute provider to improve our incident response and engagement time, improve early detection before they impact our customers, and ensure safe rollout should similar changes occur in the future. We recognize this was a significant outage to our users that rely on GitHub’s workloads and apologize for the impact this had.

Incident with Actions

On February 2, 2026, between 18:35 UTC and 22:15 UTC, GitHub Actions hosted runners were unavailable, with service degraded until full recovery at 23:10 UTC for standard runners and at February 3, 2026 00:30 UTC for larger runners. During this time, Actions jobs queued and timed out while waiting to acquire a hosted runner. Other GitHub features that leverage this compute infrastructure were similarly impacted, including Copilot Coding Agent, Copilot Code Review, CodeQL, Dependabot, GitHub Enterprise Importer, and Pages. All regions and runner types were impacted. Self-hosted runners on other providers were not impacted. This outage was caused by a backend storage access policy change in our underlying compute provider that blocked access to critical VM metadata, causing all VM create, delete, reimage, and other operations to fail. More information is available at https://azure.status.microsoft/en-us/status/history/?trackingId=FNJ8-VQZ. This was mitigated by rolling back the policy change, which started at 22:15 UTC. As VMs came back online, our runners worked through the backlog of requests that hadn’t timed out. We are working with our compute provider to improve our incident response and engagement time, improve early detection before they impact our customers, and ensure safe rollout should similar changes occur in the future. We recognize this was a significant outage to our users that rely on GitHub’s workloads and apologize for the impact this had.

Disruption with some GitHub services

From Jan 31, 2026 00:30 UTC to Feb 2, 2026 18:00 UTC Dependabot service was degraded and failed to create 10% of Automated Pull Requests. This was due to a cluster failover that connected to a read-only database.We mitigated the incident by pausing Dependabot queues until traffic was properly routed to healthy clusters. We’re working on identifying and rerunning all failed jobs during this time.We’re adding new monitors and alerts to reduce our time to detection and prevent this in the future.

Disruption with some GitHub services

From Feb 2, 2026 17:13 UTC to Feb 2, 2026 17:36 UTC we experienced failures on ~0.02% of Git operations. While deploying an internal service, a misconfiguration caused a small subset of traffic to route to a service that was not ready. During the incident we observed the degradation and statused publicly.To mitigate the issue, traffic was redirected to healthy instances and we resumed normal operation.We are improving our monitoring and deployment processes in this area to avoid future routing issues.

Degraded Experience - Failing to finalize some CCA Jobs

Between 2026-01-30 19:06 UTC and 2026-01-30 20:04 UTC, Copilot Coding Agent experienced sessions getting stuck, with a mismatch between the UI-reported session status and the underlying Actions and job execution state. Impacted users could observe Actions finish successfully but the session UI continuing to show in-progress state, or sessions remaining in queued state.The issue was caused by a feature flag that resulted in events being published to a new Kafka topic. Publishing failures led to buffer/queue overflows in the shared event publishing client, preventing other critical events from being emitted. We mitigated the incident by disabling the feature flag and redeploying production pods, which resumed normal event delivery. We are working to improve safeguards and detection around event publishing failures to reduce time to mitigation for similar issues in the future.

Actions Workflows Run Start Delays

On Jan 28, 2026, between 14:56 UTC and 15:44 UTC, GitHub Actions experienced degraded performance. During this time, workflows experienced an average delay of 49 seconds, and 4.7% of workflow runs failed to start within 5 minutes. The root cause was an atypical load pattern that overwhelmed system capacity and caused resource contention.Recovery began once additional resources came online at 15:25 UTC, with full recovery at 15:44 UTC. We are implementing safeguards to prevent this failure mode and enhancing our monitoring to detect and address similar patterns more quickly in the future.

Regression in windows runners for public repositories

On Jan 26, 2026, from approximately 14:03 UTC to 23:42 UTC, GitHub Actions experienced job failures on some Windows standard hosted runners. This was caused by a configuration difference in a new Windows runner type that caused the expected D: drive to be missing. About 2.5% of all Windows standard runners jobs were impacted. Re-run of failed workflows had a high chance of succeeding given the limited rollout of the change.The job failures were mitigated by rolling back the affected configuration and removing the provisioned runners that had this configuration. To reduce the chance of recurrence, we are expanding runner telemetry and improving validation of runner configuration changes. We are also evaluating options to accelerate the mitigation time of any similar future events.

Disruption with repo creation

Between January 24, 2026,19:56 UTC and January 25, 2026, 2:50 UTC repository creation and clone were degraded. On average, the error rate was 25% and peaked at 55% of requests for repository creation. This was due to increased latency on the repositories database impacting a read-after-write problem during repo creation. We mitigated the incident by stopping an operation that was generating load on the database to increase throughput. We have identified the repository creation problem and are working to address the issue and improve our observability to reduce our time to detection and mitigation of issues like this one in the future.

⮜ Previous Next ⮞