Between May 14, 2025 14:16 UTC and May 15, 2025 01:02 UTC the Copilot service was degraded and returned a high volume of internal server errors for requests targeting Gemini 2.5 Pro, a public preview model. On average, the error rate for Gemini 2.5 Pro was 19.6% and peaked at 41%. This was due to a high volume of internal server errors and rate limiting by the upstream model provider.We mitigated the incident by temporarily disabling Gemini 2.5 Pro for all Copilot Chat experiences, and then worked with the model provider to ensure model health was sufficiently improved before re-enabling.We are working with partners to improve communication speed and are planning to move to more resilient infrastructure to mitigate issues like this one in the future.
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
On May 8, 2025, between 14:40 UTC and 16:27 UTC the Git Operations service was degraded causing some pushes and merges to fail. On average, the error rate was 1.4% with a peak error rate of 2.24%. This was due to a configuration change which unexpectedly led a critical service to shut down on a subset of hosts that store repository data.We mitigated the incident by re-deploying the affected service to restore its functionality.In order to prevent similar incidents from happening again, we identified the cause that triggered this behavior and mitigated it for future deployments. Additionally, to reduce time to detection we will improve monitoring of the impacted service.
On May 1, 2025 from 22:09 UTC to 23:13 UTC, the Issues service was degraded and users weren't able to upload attachments. The root cause was identified to be a new feature which added a custom header to all client-side HTTP requests, causing a CORS errors when uploading attachments to our provider.We mitigated the incident by rolling back the feature flag that added the new header at 22:56 UTC. In order to prevent this from happening again, we are adding new metrics to monitor and ensure the safe rollout of changes to client-side requests.
On April 30, 2025, between 8:02 UTC and 9:05 UTC, the Pull Requests service was degraded and failed to update refs for repositories with higher traffic. This was due to a repository migration creating a larger than usual number of enqueued jobs. This resulted in an increase in job failures, delays for non-migration sourced jobs, and delays to tracking refs.We declared an incident once we confirmed that this issue was not isolated to the migrating repository and other repositories were also failing to process ref updates.We mitigated the incident by shifting the migration jobs to a different job queue.To avoid problems like this in the future, we are revisiting our repository migration process and are working to isolate potentially problematic migration workloads from non-migration workloads.
On April 29th, 2025, between 8:40am UTC and 12:50pm UTC the notifications service was degraded and stopped delivering most web and email notifications as well as some mobile push notifications. This was due to a large and faulty schema migration that rendered a set of database primaries unhealthy, affecting the notification delivery pipelines, causing delays in the most of the web and email notification deliveries.We mitigated the incident by stopping the migration and promoting replicas to replace the unhealthy primaries.In order to prevent similar incidents in the future, we are addressing the underlying issues in the online schema tooling and improving the way we interact with the database to not be disruptive to production workloads.
On April 28th, 2025, between 4AM and 11AM UTC, ~0.5% of customers experienced HTTP 500 or 429 responses for raw file access (via the GitHub website and APIs). Additionally, ~0.5% of customers may have seen slow pull request page loads and increased timeouts in the GraphQL API. The incident was caused by queueing in serving systems due to a change in traffic patterns, specifically scraping activity targeting our API. We have adjusted limits and added flow control to systems in response to the changing traffic patterns to improve our ability to prevent future large queueing issues. We’ve additionally updated rate limiting unauthenticated requests to reduce overall load, more details are here: https://github.blog/changelog/2025-05-08-updated-rate-limits-for-unauthenticated-requests/
Starting at 19:13:50 UTC, the service responsible for importing Git repositories began experiencing errors that impacted both GitHub Enterprise Importer migrations and the GitHub Importer which were restored at 22:11:00 UTC. At the time, 837 migrations across 57 organizations were affected. Impacted migrations would have shown the error message "Git source migration failed. Error message: An error occurred. Please contact support for further assistance." in the migration logs and required a retry.The root cause of the issue was a recent configuration change that caused our workers, responsible for syncing the Git repository, to lose the necessary access required for the migration. We were able to retrieve the needed access for the workers , and all dependent services resumed normal operation.We’ve identified and implemented additional safeguards to help prevent similar disruptions in the future.
On April 23, 2025, between 07:00 UTC and 07:20 UTC, multiple GitHub services experienced degradation caused by resource contention on database hosts. The resulting error rates, which ranged from 2–5% of total requests, led to intermittent service disruption for users. The issue was triggered by heavy workloads on the database leading to connection saturation.
The incident mitigated when the database throttling activated which allowed the system to rebalance the connections. This restored the traffic flow to the database and restored service functionality.
To prevent similar issues in the future, we are reviewing the capacity of the database, improving monitoring and alerting systems, and implementing safeguards to reduce time to detection and mitigation.
On April 16, 2025 between 3:22:36 PM UTC and 5:26:55 PM UTC the Pull Request service was degraded. On average, 0.7% of page views were affected. This primarily affected logged-out users, but some logged-in users were affected as well. This was due to an error in how certain Pull Request timeline events were rendered, and we resolved the incident by updating the timeline event code.We are enhancing test coverage to include additional scenarios and piloting new tools to prevent similar incidents in the future.