Incident History

Disruption with some GitHub services

This incident has been resolved.

1723471393 - 1723472139 Resolved

[Retroactive] Incident with Pull Requests

Beginning at 6:52 PM UTC on August 6, 2024 and lasting until 6:59 PM UTC, some customers of github.com saw errors when navigating to a Pull Request. The error rate peaked at ~5% for logged in users. This was due to a change which was rolled back after alerts fired during the deployment.

We did not status before we completed the rollback, and the incident is currently resolved. We are sorry for the delayed post on githubstatus.com.

1722975754 - 1722975754 Resolved

Disruption with some GitHub services

Beginning at 8:38 PM UTC on July 31, 2024 and lasting until 9:28 PM UTC on July 31, 2024, some customers of github.com saw errors when navigating to the Billing pages and/or when updating their payment method. These errors were caused due to a degradation in one of our partner services. A fix was deployed by the partner services and the Billing pages are back to being functional. For improved detection of such issues in future, we will work with the partner service to identify levers we can use to get an earlier indication of issues.

1722458286 - 1722460889 Resolved

Disruption in service with some Redis clusters

Actions is operating normally.

1722412788 - 1722417609 Resolved

Incident with Codespaces

On July 31st, 2024, between 00:31 UTC and 03:37 UTC the Codespaces service was degraded and connecting through non-web flows such as VS Code or the GitHub CLI were unavailable. Using Codespaces via the web portal was not impacted. This was due to a code change resulting in authentication failures from the Codespaces public API.We mitigated the incident by reverting the change upon discovering the cause of the issue.We are working to improve testing, monitoring, and rollout of new features to reduce our time to detection and mitigation of issues like this one in the future.

1722387177 - 1722397068 Resolved

Actions runs using large hosted runners delayed for some customers

On July 30th, 2024, between 13:25 UTC and 18:15 UTC, customers using Larger Hosted Runners may have experienced extended queue times for jobs that depended on a Runner with VNet Injection enabled in a virtual network within the East US 2 region. Runners without VNet Injection or those with VNet Injection in other regions were not affected. The issue was caused due to an outage in a third party provider blocking a large percentage of VM allocations in the East US 2 region. Once the underlying issue with the third party provider was resolved, job queue times went back to normal. We are exploring the addition of support for customers to define VNet Injection Runners with VNets across multiple regions to minimize the impact of outages in a single region.

1722363550 - 1722377420 Resolved

Incident with Codespaces

On July 30th, 2024, between 12:15 UTC and 14:22 UTC the Codespaces service was degraded in the UK South and West Europe regions. During this time, approximately 75% of attempts to create or resume Codespaces in these regions were failing.We mitigated the incident by resolving networking stability issues in these regions.We are working to improve network resiliency to reduce our time to detection and mitigation of issues like this one in the future.

1722346612 - 1722349335 Resolved

Linking internal teams to external IDP groups was broken for some users between 15:17-20:44 UTC

Between July 24th, 2024 at 15:17 UTC and July 25th, 2024 at 21:04 UTC, the external identities service was degraded and prevented customers from linking teams to external groups on the create/edit team page. Team creation and team edits would appear to function as normal, but the selected group would not be linked to the team after form submission. This was due to a bug in the Primer experimental SelectPanel component that was mistakenly rolled out to customers via a feature flag.We mitigated the incident by scaling the feature flag back down to 0% of actors.We are making improvements to our release process and test coverage to avoid similar incidents in the future.

1721941495 - 1721941502 Resolved

Events are delayed across GitHub

On July 25th, 2024, between 15:30 and 19:10 UTC, the Audit Log service experienced degraded write performance. During this period, Audit Log reads remained unaffected, but customers would have encountered delays in the availability of their current audit log data. There was no data loss as a result of this incident.The issue was isolated to a single partition within the Audit Log datastore. Upon restarting the primary partition, we observed an immediate recovery and a subsequent increase in successful writes. The backlog of log messages was fully processed by approximately 00:40 UTC on July 26th.We are working with our datastore team to ensure mitigation is in place to prevent future impact. Additionally, we will investigate whether there are any actions we can take on our end to reduce the impact and time to mitigate in the future.

1721933096 - 1721935255 Resolved

Disruption with GitHub Copilot Chat

On July 23, 2024, between 21:40 UTC and 22:00 UTC, Copilot Chat experienced errors and service degradation. During this time, the global error rate peaked at 20% of Chat requests.This was due to a faulty deployment in a service provider that caused server errors from a single region. Traffic was routed away from this region at 22:00 UTC which restored functionality while the upstream service provider rolled back their change. The rollback was completed at 22:38 UTC.We are working to improve our ability to respond more quickly to similar issues through faster regional redirection and working with our upstream provider on improved monitoring.

1721770828 - 1721774314 Resolved
⮜ Previous Next ⮞