Between July 28, 2025, 22:23:00 UTC and July 29, 2025 02:06:00 UTC, GitHub experienced degraded performance across multiple services including API, Issues, GraphQL and Pull Requests. During this time, approximately 4% of Web and API requests resulted in 500 errors. This incident was caused by DNS resolution failure while decommissioning infrastructure hosts. We resolved the incident by removing references to the stale hosts.We are working to improve our host replacement process by correcting our automatic host ejection behavior and by ensuring configuration is updated before hosts are decommissioned. This will prevent similar issues in the future.
Between approximately 21:41 UTC July 28th and 03:15 UTC July 29th, GitHub Enterprise Importer (GEI) operated in a degraded state where migrations could not be processed. Our investigation found that a component of the GEI infrastructure had been improperly taken out of service and could not be restored to its previous configuration. This necessitated the provisioning of new resources to resolve the incident.As a result, customers will need to add our new IP range to the following IP allow lists, if enabled:- The IP allow list on your destination GitHub.com organization or enterprise- If you're running migrations from GitHub.com, the IP allow list on your source GitHub.com organization or enterprise- If you're running migrations from a GitHub Enterprise Server, Bitbucket Server or Bitbucket Data Center instance, the allow list on your configured Azure Blob Storage or -- Amazon S3 storage account- If you're running migrations from Azure DevOps, the allow list on your Azure DevOps organizationThe new GEI IP ranges for inclusion in applicable IP allow lists are:- 20.99.172.64/28- 135.234.59.224/28 The following IP ranges are no longer used by GEI and can be removed from all applicable IP allow lists:- 40.71.233.224/28- 20.125.12.8/29Users who have run migrations using GitHub Enterprise Importer in the past 90 days will receive email alerts about this change.
Between July 28, 2025 16:31 UTC to July 29, 2025 12:05 UTC users saw degraded Git Operations for raw file downloads. On average, the error rate was .005%, with a peak error rate of 3.89%. This was due to a sustained increase in unauthenticated repository traffic.We mitigated the incident by applying regional rate limiting, rolling back a service that was unable to scale with the additional traffic, and addressed a bug that impacted the caching of raw requests. Additionally, we horizontally scaled several dependencies of the service to appropriately handle the increase in traffic.We are working on improving our time to detection and have implemented controls to prevent similar incidents in future.
On July 23rd, 2025, from approximately 14:30 to 16:30 UTC, GitHub Actions experienced delayed job starts for workflows in private repos using Ubuntu-24 standard hosted runners. This was due to resource provisioning failures in one of our datacenter regions. During this period, approximately 2% of Ubuntu-24 hosted runner jobs on private repos were delayed. Other hosted runners, self-hosted runners, and public repo workflows were unaffected.To mitigate the issue, additional worker capacity was added from a different datacenter region at 15:35 UTC and further increased at 16:00 UTC. By 16:30 UTC, job queues were healthy and service was operating normally. Since the incident, we have deployed changes to improve how regional health is accounted for when allocating new runners, and we are investigating further improvements to our automated capacity scaling logic and manual overrides to prevent a recurrence.
On July 22nd, 2025, between 17:58 and 18:35 UTC, the Copilot service experienced degraded availability for Claude Sonnet 4 requests. 4.7% of Claude 4 requests failed during this time. No other models were impacted. The issue was caused by an upstream problem affecting our ability to serve requests.We mitigated by rerouting capacity and monitoring recovery. We are improving detection and mitigation to reduce future impact.
On July 21, 2025, between 07:20 UTC and 08:00 UTC, the Copilot service experienced degraded availability for Claude 4 requests. 2% of Claude 4 requests failed during this time. The issue was caused by an upstream problem affecting our ability to serve requests.We mitigated by rerouting capacity and monitoring recovery. We are improving detection and mitigation to reduce future impact.
On July 21st, 2025, between 07:00 UTC and 09:45 UTC the API, Codespaces, Copilot, Issues, Package Registry, Pull Requests and Webhook services were degraded and experienced dropped requests and increased latency. At the peak of this incident (a 2 minute period around 07:00 UTC) error rates peaked at 11% and went down shortly after. Average web request load times rose to 1 second during this same time frame. After this period, traffic gradually recovered but error rate and latency remained slightly elevated until the end of the incident.This incident was triggered by a kernel bug that caused a crash of some of our load balancers during a scheduled process after a kernel upgrade. In order to mitigate the incident, we halted the roll out of our upgrades, and rolled back the impacted instances. We are working to make sure the kernel version is fully removed from our fleet. As a precaution, we temporarily paused the scheduled process to prevent any unintended use in the affected kernel. We also tuned our alerting so we can more quickly detect and mitigate future instances where we experience a sudden drop in load-balancing capacity.
On 15 July, between 19:55 and 19:58 UTC, requests to GitHub had a high failure rate while successful requests suffered up to 10x expected latency.
Browser-based requests saw a failure rate of up to 20%, GraphQL had up to a 9% failure rate and 2% of REST API requests failed. Any downstream service dependent on GitHub APIs was also affected during this short window.
The failure stemmed from a database query change, and was rolled back by our deployment tooling which automatically detected the issue. We will continue to invest in automated detection and rollback with a goal of minimizing time to recovery.
On July 16, 2025, between 05:20 UTC and 08:30 UTC, the Copilot service experienced degraded availability for Claude 3.7 requests. Around 10% of Claude 3.7 requests failed during this time. The issue was caused by an upstream problem affecting our ability to serve requests.We mitigated by rerouting capacity and monitoring recovery. We are improving detection and mitigation to reduce future impact.
On July 8, 2025, between 14:20 UTC and 16:30 UTC, GitHub Actions service experienced degraded performance leading to delays in updates to Actions workflow runs including missing logs and delayed run status. During this period, 1.07% of Actions workflow runs experienced delayed updates, while 0.34% of runs completed, but showed as canceled in their status. The incident was caused by imbalanced load in our underlying service infrastructure. The issue was mitigated by scaling up our services and tuning resource thresholds. We are working to improve our resilience to load spikes, capacity planning to prevent similar issues, and are implementing more robust monitoring to reduce detection and mitigation time for similar incidents in the future.