From Feb 12, 2026 09:16:00 UTC to Feb 12, 2026 11:01 UTC, users attempting to download repository archives (tar.gz/zip) that include Git LFS objects received errors. Standard repository archives without LFS objects were not affected. On average, the archive download error rate was 0.0042% and peaked at 0.0339% of requests to the service. This was caused by deploying a corrupt configuration bundle, resulting in missing data used for network interface connections by the service.We mitigated the incident by applying the correct configuration to each site. We have added checks for corruption in this deployment, and will add auto-rollback detection for this service to prevent issues like this in the future.
On February 12, 2026, between 00:51 UTC and 09:35 UTC, users attempting to create or resume Codespaces experienced elevated failure rates across Europe, Asia and Australia, peaking at a 90% failure rate.The disconnects were triggered by a bad configuration rollout in a core networking dependency, which led to internal resource provisioning failures. We are working to improve our alerting thresholds to catch issues before they impact customers and strengthening rollout safeguards to prevent similar incidents.
On February 11 between 16:37 UTC and 00:59 UTC the following day, 4.7% of workflows running on GitHub Larger Hosted Runners were delayed by an average of 37 minutes. Standard Hosted and self-hosted runners were not impacted. This incident was caused by capacity degradation in Central US for Larger Hosted Runners. Workloads not pinned to that region were picked up by other regions, but were delayed as those regions became saturated. Workloads configured with private networking in that region were delayed until compute capacity in that region recovered. The issue was mitigated by rebalancing capacity across internal and external workloads and general increases in capacity in affected regions to speed recovery. In addition to working with our compute partners on the core capacity degradation, we are working to ensure other regions are better able to absorb load with less delay to customer workloads. For pinned workflows using private networking, we are shipping support soon for customers to failover if private networking is configured in a paired region.
On February 11, 2026, between 13:51 UTC and 17:03 UTC, the GraphQL API experienced degraded performance due to elevated resource utilization. This resulted in incoming client requests waiting longer than normal, timing out in certain cases. During the impact window, approximately 0.65% of GraphQL requests experienced these issues, peaking at 1.06%. The increased load was due to an increase in query patterns that drove higher than expected resource utilization of the GraphQL API. We mitigated the incident by scaling out resource capacity and limiting the capacity available to these query patterns. We're improving our telemetry to identify slow usage growth and changes in GraphQL workloads. We’ve also added capacity safeguards to prevent similar incidents in the future.
On February 11, 2025, between 14:30 UTC and 15:30 UTC, the Copilot service experienced degraded availability for requests to Claude Haiku 4.5. During this time, on average 10% of requests failed with 23% of sessions impacted. The issue was caused by an upstream problem from multiple external model providers that affected our ability to serve requests. The incident was mitigated once one of the providers resolved the issue and we rerouted capacity fully to that provider. We have improved our telemetry to improve incident observability and implemented an automated retry mechanism for requests to this model to mitigate similar future upstream incidents.
On February 10th, 2026, between 14:35 UTC and 15:58 UTC web experiences on GitHub.com were degraded including Pull Requests and Authentication, resulting in intermittent 5xx errors and timeouts. The error rate on web traffic peaked at approximately 2%. This was due to increased load on a critical database, which caused significant memory pressure resulting in intermittent errors. We mitigated the incident by applying a configuration change to the database to increase available memory on the host. We are working to identify changes in load patterns and are reviewing the configuration of our databases to ensure there is sufficient capacity to meet growth. Additionally, we are improving monitoring and self-healing functionalities for database memory issues to reduce our time to detect and mitigation.
On February 9, 2026, GitHub experienced two related periods of degraded availability affecting GitHub.com, the GitHub API, GitHub Actions, Git operations, GitHub Copilot, and other services. The first period occurred between 16:12 UTC and 17:39 UTC, and the second between 18:53 UTC and 20:09 UTC. In total, users experienced approximately 2 hours and 43 minutes of degraded service across the two incidents.
During both incidents, users encountered errors loading pages on GitHub.com, failures when pushing or pulling code over HTTPS, failures starting or completing GitHub Actions workflow runs, and errors using GitHub Copilot. Additional services including GitHub Issues, pull requests, webhooks, Dependabot, GitHub Pages, and GitHub Codespaces experienced intermittent errors. SSH-based Git operations were not affected during either incident.
Our investigation determined that both incidents shared the same underlying cause: a configuration change to a user settings caching mechanism caused a large volume of cache rewrites to occur simultaneously. During the first incident, asynchronous rewrites overwhelmed a shared infrastructure component responsible for coordinating background work, triggering cascading failures. Increased load caused the service responsible for proxying Git operations over HTTPS to exhaust available connections, preventing it from accepting new requests. We mitigated this incident by disabling async cache rewrites and restarting the affected Git proxy service across multiple datacenters.
An additional source of updates to the same cache circumvented our initial mitigations and caused the second incident. This generated a high volume of synchronous writes, causing replication delays that cascaded in a similar pattern and again exhausted the Git proxy’s connection capacity, degrading availability across multiple services. We mitigated by disabling the source of the cache rewrites and again restarting Git proxy.
We know these incidents disrupted the workflows of millions of developers. While we have made substantial, long-term investments in how GitHub is built and operated to improve resilience, GitHub's availability is not yet meeting our expectations. Getting there requires deep architectural work that is already underway, as well as urgent, targeted improvements. We are taking the following immediate steps:
We have already optimized the caching mechanism to avoid write amplification and added self-throttling during bulk updates.
We are adding safeguards to ensure the caching mechanism responds more quickly to rollbacks and strengthening how changes to these caching systems are planned, validated, and rolled out with additional checks.
We are fixing the underlying cause of connection exhaustion in our Git HTTPS proxy layer so the proxy can recover from this failure mode automatically without requiring manual restarts.
GitHub is critical infrastructure for your work, your teams, and your businesses. We're focusing on these mitigations and long-term infrastructure work so GitHub is available, at scale, when and where you need it.
GitHub experienced degraded Copilot policy propagation from enterprise to organizations between February 3 at 21:00 UTC through February 10 at 16:00 UTC. During this period, policy changes could take up to 24 hours to apply. We mitigated the issue on February 10 at 16:00 UTC after rolling back a regression that caused the delays. The propagation queue fully caught up on the delayed items by February 11 at 10:35 UTC, and policy changes now propagate normally.During this incident, whenever an enterprise updated a Copilot policy (including model policies), there were significant delays before those policy changes reached their child organizations and assigned users. The delay was caused by a large backlog in the background job queue responsible for propagating Copilot policy updates.Our investigation determined the incident was caused by a code change shipped on February 3 that increased the number of background jobs enqueued per policy update, in order to accommodate upcoming feature work. When new Copilot models launched on February 5th and 7th, triggering policy updates across many enterprises, the higher job volume overwhelmed the shared background worker queue, resulting in prolonged propagation delays. No policy updates were lost; they were queued and processed once the backlog cleared.We understand these delays disrupted policy management for customers using Copilot at scale and have taken the following immediate steps:1. Restored the optimized propagation path and put tests in place to avoid a regression.2. Ensured upcoming features are compatible with this design. 3. Added alerting on queue depth to detect propagation backlogs immediately.GitHub is critical infrastructure for your work, your teams, and your businesses. We are focused on these mitigations and continued improvements so Copilot policy changes propagate reliably and quickly.
On February 9, 2026, GitHub experienced two related periods of degraded availability affecting GitHub.com, the GitHub API, GitHub Actions, Git operations, GitHub Copilot, and other services. The first period occurred between 16:12 UTC and 17:39 UTC, and the second between 18:53 UTC and 20:09 UTC. In total, users experienced approximately 2 hours and 43 minutes of degraded service across the two incidents.
During both incidents, users encountered errors loading pages on GitHub.com, failures when pushing or pulling code over HTTPS, failures starting or completing GitHub Actions workflow runs, and errors using GitHub Copilot. Additional services including GitHub Issues, pull requests, webhooks, Dependabot, GitHub Pages, and GitHub Codespaces experienced intermittent errors. SSH-based Git operations were not affected during either incident.
Our investigation determined that both incidents shared the same underlying cause: a configuration change to a user settings caching mechanism caused a large volume of cache rewrites to occur simultaneously. During the first incident, asynchronous rewrites overwhelmed a shared infrastructure component responsible for coordinating background work, triggering cascading failures. Increased load caused the service responsible for proxying Git operations over HTTPS to exhaust available connections, preventing it from accepting new requests. We mitigated this incident by disabling async cache rewrites and restarting the affected Git proxy service across multiple datacenters.
An additional source of updates to the same cache circumvented our initial mitigations and caused the second incident. This generated a high volume of synchronous writes, causing replication delays that cascaded in a similar pattern and again exhausted the Git proxy’s connection capacity, degrading availability across multiple services. We mitigated by disabling the source of the cache rewrites and again restarting Git proxy.
We know these incidents disrupted the workflows of millions of developers. While we have made substantial, long-term investments in how GitHub is built and operated to improve resilience, GitHub's availability is not yet meeting our expectations. Getting there requires deep architectural work that is already underway, as well as urgent, targeted improvements. We are taking the following immediate steps:
We have already optimized the caching mechanism to avoid write amplification and added self-throttling during bulk updates.
We are adding safeguards to ensure the caching mechanism responds more quickly to rollbacks and strengthening how changes to these caching systems are planned, validated, and rolled out with additional checks.
We are fixing the underlying cause of connection exhaustion in our Git HTTPS proxy layer so the proxy can recover from this failure mode automatically without requiring manual restarts.
GitHub is critical infrastructure for your work, your teams, and your businesses. We're focusing on these mitigations and long-term infrastructure work so GitHub is available, at scale, when and where you need it.
On February 9th notifications service started showing degradation around 13:50 UTC, resulting in an increase in notification delivery delays. Our team started investigating. Around 14:30 UTC the service started to recover as the team continued investigating the incident. Around 15:20 UTC degradation resurfaced, with increasing delays in notification deliveries and small error rate (below 1%) on UI and API endpoints related to notifications. At 16:30 UTC, we mitigated the incident by reducing contention through throttling workloads and performing a database failover. The median delay for notification deliveries was 80 minutes at this point and queues started emptying. Around 19:30 UTC the backlog of notifications was processed, bringing the service back to normal and declaring the incident closed.The incident was caused by the notifications database showing degradation under intense load. Most notifications-related asynchronous workloads, including notifications deliveries, were stopped to try to reduce the pressure on the database. To ensure system stability, a database failover was executed. Following the failover, we applied a configuration change to improve the performance. The service started recovering after these changes.We are reviewing the configuration of our databases to understand the performance drop and prevent similar issues from happening in the future. We are also investing in monitoring to detect and mitigate this class of incidents faster.