Incident History

Disruption with some GitHub services

On February 10th, 2026, between 14:35 UTC and 15:58 UTC web experiences on GitHub.com were degraded including Pull Requests and Authentication, resulting in intermittent 5xx errors and timeouts. The error rate on web traffic peaked at approximately 2%. This was due to increased load on a critical database, which caused significant memory pressure resulting in intermittent errors. We mitigated the incident by applying a configuration change to the database to increase available memory on the host. We are working to identify changes in load patterns and are reviewing the configuration of our databases to ensure there is sufficient capacity to meet growth. Additionally, we are improving monitoring and self-healing functionalities for database memory issues to reduce our time to detect and mitigation.

1770736025 - 1770739129 Resolved

Incident with Issues, Actions and Git Operations

On February 9, 2026, GitHub experienced two related periods of degraded availability affecting GitHub.com, the GitHub API, GitHub Actions, Git operations, GitHub Copilot, and other services. The first period occurred between 16:12 UTC and 17:39 UTC, and the second between 18:53 UTC and 20:09 UTC. In total, users experienced approximately 2 hours and 43 minutes of degraded service across the two incidents.

During both incidents, users encountered errors loading pages on GitHub.com, failures when pushing or pulling code over HTTPS, failures starting or completing GitHub Actions workflow runs, and errors using GitHub Copilot. Additional services including GitHub Issues, pull requests, webhooks, Dependabot, GitHub Pages, and GitHub Codespaces experienced intermittent errors. SSH-based Git operations were not affected during either incident.

Our investigation determined that both incidents shared the same underlying cause: a configuration change to a user settings caching mechanism caused a large volume of cache rewrites to occur simultaneously. During the first incident, asynchronous rewrites overwhelmed a shared infrastructure component responsible for coordinating background work, triggering cascading failures. Increased load caused the service responsible for proxying Git operations over HTTPS to exhaust available connections, preventing it from accepting new requests. We mitigated this incident by disabling async cache rewrites and restarting the affected Git proxy service across multiple datacenters.

An additional source of updates to the same cache circumvented our initial mitigations and caused the second incident. This generated a high volume of synchronous writes, causing replication delays that cascaded in a similar pattern and again exhausted the Git proxy’s connection capacity, degrading availability across multiple services. We mitigated by disabling the source of the cache rewrites and again restarting Git proxy.

We know these incidents disrupted the workflows of millions of developers. While we have made substantial, long-term investments in how GitHub is built and operated to improve resilience, GitHub's availability is not yet meeting our expectations. Getting there requires deep architectural work that is already underway, as well as urgent, targeted improvements. We are taking the following immediate steps:

  1. We have already optimized the caching mechanism to avoid write amplification and added self-throttling during bulk updates.
  2. We are adding safeguards to ensure the caching mechanism responds more quickly to rollbacks and strengthening how changes to these caching systems are planned, validated, and rolled out with additional checks.
  3. We are fixing the underlying cause of connection exhaustion in our Git HTTPS proxy layer so the proxy can recover from this failure mode automatically without requiring manual restarts.

GitHub is critical infrastructure for your work, your teams, and your businesses. We're focusing on these mitigations and long-term infrastructure work so GitHub is available, at scale, when and where you need it.

1770663693 - 1770667772 Resolved

Copilot Policy Propagation Delays

GitHub experienced degraded Copilot policy propagation from enterprise to organizations between February 3 at 21:00 UTC through February 10 at 16:00 UTC. During this period, policy changes could take up to 24 hours to apply. We mitigated the issue on February 10 at 16:00 UTC after rolling back a regression that caused the delays. The propagation queue fully caught up on the delayed items by February 11 at 10:35 UTC, and policy changes now propagate normally.During this incident, whenever an enterprise updated a Copilot policy (including model policies), there were significant delays before those policy changes reached their child organizations and assigned users. The delay was caused by a large backlog in the background job queue responsible for propagating Copilot policy updates.Our investigation determined the incident was caused by a code change shipped on February 3 that increased the number of background jobs enqueued per policy update, in order to accommodate upcoming feature work. When new Copilot models launched on February 5th and 7th, triggering policy updates across many enterprises, the higher job volume overwhelmed the shared background worker queue, resulting in prolonged propagation delays. No policy updates were lost; they were queued and processed once the backlog cleared.We understand these delays disrupted policy management for customers using Copilot at scale and have taken the following immediate steps:1. Restored the optimized propagation path and put tests in place to avoid a regression.2. Ensured upcoming features are compatible with this design. 3. Added alerting on queue depth to detect propagation backlogs immediately.GitHub is critical infrastructure for your work, your teams, and your businesses. We are focused on these mitigations and continued improvements so Copilot policy changes propagate reliably and quickly.

1770654581 - 1770717477 Resolved

Incident with Pull Requests

On February 9, 2026, GitHub experienced two related periods of degraded availability affecting GitHub.com, the GitHub API, GitHub Actions, Git operations, GitHub Copilot, and other services. The first period occurred between 16:12 UTC and 17:39 UTC, and the second between 18:53 UTC and 20:09 UTC. In total, users experienced approximately 2 hours and 43 minutes of degraded service across the two incidents.

During both incidents, users encountered errors loading pages on GitHub.com, failures when pushing or pulling code over HTTPS, failures starting or completing GitHub Actions workflow runs, and errors using GitHub Copilot. Additional services including GitHub Issues, pull requests, webhooks, Dependabot, GitHub Pages, and GitHub Codespaces experienced intermittent errors. SSH-based Git operations were not affected during either incident.

Our investigation determined that both incidents shared the same underlying cause: a configuration change to a user settings caching mechanism caused a large volume of cache rewrites to occur simultaneously. During the first incident, asynchronous rewrites overwhelmed a shared infrastructure component responsible for coordinating background work, triggering cascading failures. Increased load caused the service responsible for proxying Git operations over HTTPS to exhaust available connections, preventing it from accepting new requests. We mitigated this incident by disabling async cache rewrites and restarting the affected Git proxy service across multiple datacenters.

An additional source of updates to the same cache circumvented our initial mitigations and caused the second incident. This generated a high volume of synchronous writes, causing replication delays that cascaded in a similar pattern and again exhausted the Git proxy’s connection capacity, degrading availability across multiple services. We mitigated by disabling the source of the cache rewrites and again restarting Git proxy.

We know these incidents disrupted the workflows of millions of developers. While we have made substantial, long-term investments in how GitHub is built and operated to improve resilience, GitHub's availability is not yet meeting our expectations. Getting there requires deep architectural work that is already underway, as well as urgent, targeted improvements. We are taking the following immediate steps:

  1. We have already optimized the caching mechanism to avoid write amplification and added self-throttling during bulk updates.
  2. We are adding safeguards to ensure the caching mechanism responds more quickly to rollbacks and strengthening how changes to these caching systems are planned, validated, and rolled out with additional checks.
  3. We are fixing the underlying cause of connection exhaustion in our Git HTTPS proxy layer so the proxy can recover from this failure mode automatically without requiring manual restarts.

GitHub is critical infrastructure for your work, your teams, and your businesses. We're focusing on these mitigations and long-term infrastructure work so GitHub is available, at scale, when and where you need it.

1770653962 - 1770658857 Resolved

Notifications are delayed

On February 9th notifications service started showing degradation around 13:50 UTC, resulting in an increase in notification delivery delays. Our team started investigating. Around 14:30 UTC the service started to recover as the team continued investigating the incident. Around 15:20 UTC degradation resurfaced, with increasing delays in notification deliveries and small error rate (below 1%) on UI and API endpoints related to notifications. At 16:30 UTC, we mitigated the incident by reducing contention through throttling workloads and performing a database failover. The median delay for notification deliveries was 80 minutes at this point and queues started emptying. Around 19:30 UTC the backlog of notifications was processed, bringing the service back to normal and declaring the incident closed.The incident was caused by the notifications database showing degradation under intense load. Most notifications-related asynchronous workloads, including notifications deliveries, were stopped to try to reduce the pressure on the database. To ensure system stability, a database failover was executed. Following the failover, we applied a configuration change to improve the performance. The service started recovering after these changes.We are reviewing the configuration of our databases to understand the performance drop and prevent similar issues from happening in the future. We are also investing in monitoring to detect and mitigate this class of incidents faster.

1770652492 - 1770665385 Resolved

Incident with Actions

On February 9th, 2026, between 09:16 UTC and 15:12 UTC GitHub Actions customers experienced run start delays. Approximately 0.6% of runs across 1.8% of repos were affected, with an average delay of 19 minutes for those delayed runs.The incident occurred when increased load exposed a bottleneck in our event publishing system, causing one compute node to fall behind on processing Actions Jobs. We mitigated by rebalancing traffic and increasing timeouts for event processing. We have since isolated performance critical events to a new, dedicated publisher to prevent contention between events and added safeguards to better tolerate processing timeouts.

1770646622 - 1770651999 Resolved

Degraded performance for Copilot Coding Agent

On February 9, 2026, between ~06:00 UTC and ~12:12 UTC, Copilot Coding Agent and related Copilot API endpoints experienced degraded availability. The primary impact was to agent-based workflows (requests to /agents/swe/*, including custom agent configuration checks), where 154k users saw failed requests and error responses in their editor/agent experience. Impact was concentrated among users and integrations actively using Copilot Coding Agent with VS Code. The degradation was caused by an unexpected surge in traffic to the related API endpoints that exceeded an internal secondary rate limit. That resulted in upstream request denials which were surfaced to users as elevated 500 errors.We mitigated the incident by deploying a change that increased the applicable rate limit for this traffic, which allowed requests to complete successfully and returned the service to normal operation.After the mitigation, we deployed guardrails with applicable caching to avoid a repeat of similar incidents. We also temporarily increased infrastructure capacity to better handle backlog recovery from the rate limiting. We're are improving monitoring around growing agentic API endpoints.

1770631278 - 1770639142 Resolved

Degraded Performance in Webhooks API and UI, Pull Requests

On February 9, 2026, between 07:05 UTC and 11:26 UTC, GitHub experienced intermittent degradation across Issues, Pull Requests, Webhooks, Actions, and Git operations. Approximately every 30 minutes, users encountered brief periods of elevated errors and timeouts lasting roughly 15 seconds each. During the incident window, approximately 1–2% of requests were impacted across these services, with Git operations experiencing up to 7% error rates during individual spikes. GitHub Actions saw up to 2% of workflow runs delayed by a median of approximately 7 minutes due to backups created during these periods. This was due to multiple resource-intensive workloads running simultaneously, which caused intermittent processing delays on the data storage layer. We mitigated the incident by scaling storage to a larger compute capacity, which resolved the processing delays. We are working to improve detection of resource-intensive queries, identify changes in load patterns, and enhance our monitoring to reduce our time to detection and mitigation of issues like this one in the future.

1770624933 - 1770636393 Resolved

Incident with Pull Requests

On February 6, 2026, between 17:49 UTC and 18:36 UTC, the GitHub Mobile service was degraded, and some users were unable to create pull request review comments on deleted lines (and in some cases, comments on deleted files). This impacted users on the newer comment-positioning flow available in version 1.244.0 of the mobile apps. Telemetry indicated that the failures increased as the Android rollout progressed. This was due to a defect in the new comment-positioning workflow that could result in the server rejecting comment creation for certain deleted-line positions.We mitigated the incident by halting the Android rollout and implementing interim client-side fallback behavior while a platform fix is in progress. The client-side fallback is scheduled to be published early this week. We are working to (1) add clearer client-side error handling (avoid infinite spinners), (2) improve monitoring/alerting for these failures, and (3) adopt stable diff identifiers for diff-based operations to reduce the likelihood of recurrence.

1770400153 - 1770403013 Resolved

Incident with Copilot

On February 10, 2026, between 10:28 and 11:54 UTC, Visual Studio Code users experienced a degraded experience on GitHub Copilot when using the Claude Opus 4.6 model. During this time, approximately 50% of users encountered agent turn failures due to the model being unable to serve the volume of incoming requests.Rate limits set too low for actual demand caused the issue. While the initial deployment showed no concerns, a surge in traffic from Europe on the following day caused VSCode to begin hitting rate limit errors. Additionally, a degradation message intended to notify users of high usage failed to trigger due to a misconfiguration. We mitigated the incident by adjusting rate limits for the model.We improved our rate limiting to prevent future models from experiencing similar issues. We are also improving our capacity planning processes to reduce the risk of similar incidents in the future, and enhancing our detection and mitigation capabilities to reduce impact to customers.

1770376561 - 1770379084 Resolved
⮜ Previous Next ⮞