Incident History

git operations over ssh seeing increased latency on github.com

From Oct 22, 2025 15:00 UTC to Oct 24, 2025 14:30 UTC git operations via SSH saw periods of increased latency and failed requests, with failure rates ranging from 1.5% to a single spike of 15%. Git operations over http were not affected. This was due to resource exhaustion on our backend ssh servers. We mitigated the incident by increasing the available resources for ssh connections. We are improving the observability and dynamic scalability of our backend to prevent issues like this in the future.

1761298319 - 1761300620 Resolved

Incident with Actions - Larger hosted runners

On October 23, 2025, between 15:54 UTC and 19:20 UTC, GitHub Actions larger hosted runners experienced degraded performance, with 1.4% of overall workflow runs and 29% of larger hosted runner jobs failing to start or timing out within 5 minutes.The full set of contributing factors is still under investigation, but the customer impact was due to database performance degradation, triggered by routine database changes causing a load profile that triggered a bug in the underlying database platform used for larger runners.Impact was mitigated through a combination of scaling up the database and reducing load. We are working with partners to resolve the underlying bug and have paused similar database changes until it is resolved.

1761237205 - 1761251152 Resolved

Incident with API Requests

On October 22, 2025, between 14:06 UTC and 15:17 UTC, less than 0.5% of web users experienced intermittent slow page loads on GitHub.com. During this time, API requests showed increased latency, with up to 2% timing out. The issue was caused by elevated loads on one of our databases caused by a poorly performing query, which impacted performance for a subset of requests.We identified the source of the load and optimized the query to restore normal performance. We’ve added monitors for early detection for query performance, and we continue to monitor the system closely to ensure ongoing stability.

1761143364 - 1761148417 Resolved

Disruption with some GitHub services

On October 21, 2025, between 13:30 and 17:30 UTC, GitHub Enterprise Cloud Organization SAML Single Sign-On experienced degraded performance. Customers may have been unable to successfully authenticate into their GitHub Organizations during this period. Organization SAML recorded a maximum of 0.4% of SSO requests failing during this timeframe.This incident stemmed from a failure in a read replica database partition responsible for storing license usage information for GitHub Enterprise Cloud Organizations. This partition failure resulted in users from affected organizations, whose license usage information was stored on this partition, being unable to access SSO during the aforementioned window. A successful SSO requires an available license for the user who is accessing a GitHub Enterprise Cloud Organization backed by SSO.The failing partition was subsequently taken out of service, thereby mitigating the issue. Remedial actions are currently underway to ensure that a read replica failure does not compromise the overall service availability.

1761062454 - 1761068374 Resolved

Incident with Actions

On October 21, 2025, between 07:55 UTC and 12:20 UTC, GitHub Actions experienced degraded performance. During this time, 2.11% workflow runs failed to start within 5 minutes, with an average delay of 8.2 minutes. The root cause was increased latency on a node in one of our Redis clusters, triggered by resource contention after a patching event became stuck. Recovery began once the patching process was unstuck and normal connectivity to the Redis cluster was restored at 11:45 UTC, but it took until 12:20 UTC to clear the backlog of queued work. We are implementing safeguards to prevent this failure mode and enhancing our monitoring to detect and address problems like this more quickly in the future.

1761037970 - 1761049699 Resolved

Disruption with Grok Code Fast 1 in Copilot

From October 20th at 14:10 UTC until 16:40 UTC, the Copilot service experienced degradation due to an infrastructure issue which impacted the Grok Code Fast 1 model, leading to a spike in errors affecting 30% of users. No other models were impacted. The incident was caused due to an outage with an upstream provider.

1760971560 - 1760978402 Resolved

Codespaces creation failling

On October 20, 2025, between 08:05 UTC and 10:50 UTC the Codespaces service was degraded, with users experiencing failures creating new codespaces and resuming existing ones. On average, the error rate for codespace creation was 39.5% and peaked at 71% of requests to the service during the incident window. Resume operations averaged 23.4% error rate with a peak of 46%. This was due to a cascading failure triggered by an outage in a 3rd-party dependency required to build devcontainer images.The impact was mitigated when the 3rd-party dependency recovered.We are investigating opportunities to make this dependency not a critical path for our container build process and working to improve our monitoring and alerting systems to reduce our time to detection of issues like this one in the future.

1760950588 - 1760958074 Resolved

Disruption with push notifications

On October 17th, 2025, between 12:51 UTC and 14:01 UTC, mobile push notifications failed to be delivered for a total duration of 70 minutes. This affected github.com and GitHub Enterprise Cloud in all regions. The disruption was related to an erroneous configuration change to cloud resources used for mobile push notification delivery.We are reviewing our procedures and management of these cloud resources to prevent such an incident in the future.

1760706688 - 1760710365 Resolved

Disruption with some GitHub services

On October 14th, 2025, between 18:26 UTC and 18:57 UTC a subset of unauthenticated requests to the commit endpoint for certain repositories received 503 errors. During the event, the average error rate was 3%, peaking at 3.5% of total requests.This event was triggered by a recent configuration change and some traffic pattern shifts on the service. We were alerted of the issue immediately and made changes to the configuration in order to mitigate the problem. We are working on automatic mitigation solutions and better traffic handling in order to prevent issues like this in the future.

1760466398 - 1760468231 Resolved

Disruption with GPT-5-mini in Copilot

On Oct 14th, 2025, between 13:34 UTC and 16:00 UTC the Copilot service was degraded for GPT-5 mini model. On average, 18% of the requests to GPT-5 mini failed due to an issue with our upstream provider.We notified the upstream provider of the problem as soon as it was detected and mitigated the issue by failing over to other providers. The upstream provider has since resolved the issue.We are working to improve our failover logic to mitigate similar upstream failures more quickly in the future.

1760450714 - 1760457629 Resolved
⮜ Previous Next ⮞