Incident History

Potential disruption with our Agent Control Plane UI Settings

On November 26th, 2025, between approximately 02:24 UTC and December 8th, 2025 at 20:26 UTC, enterprise administrators experienced a disruption when viewing agent session activities in the Enterprise AI Controls page. During this period, users were unable to list agent session activity in the AI Controls view. This did not impact viewing agent session activity in audit logs or directly navigating to individual agent session logs, or otherwise managing AI Agents.The issue was caused by a misconfiguration in a change deployed on November 25th that unintentionally prevented data from being published to an internal Kafka topic responsible for feeding the AI Controls page with agent session activity information.The problem was identified and mitigated on December 8th by correcting the configuration issue. GitHub is improving monitoring for data pipeline dependencies and enhancing pre-deployment validation to catch configuration issues before they reach production.

1765223499 - 1765227970 Resolved

Team synchronization is experiencing delays for non enterprise managed users

On December 5th, 2025, between 12:00 pm UTC and 9:00 pm UTC, our Team Synchronization service experienced a significant degradation, preventing over 209,000 organization teams from syncing their identity provider (IdP) groups. The incident was triggered by a buildup of synchronization requests, resulting in elevated Redis key usage and high CPU consumption on the underlying Redis cluster.To mitigate further impact, we proactively paused all team synchronization requests between 3:00 pm UTC and 8:15 pm UTC, allowing us to stabilize the Redis cluster. Our engineering team also resolved the issue by flushing the affected Redis keys and queues, which promptly stopped runaway growth and restored service health. Additionally, we scaled up our infrastructure resources to improve our ability to process the high volume of synchronization requests. All pending team synchronizations were successfully processed following service restoration.We are working to strengthen the Team Synchronization service by implementing a killswitch, adding throttling to prevent excessive enqueueing of synchronization requests, and improving the scheduler to avoid duplicate job requests. Additionally, we’re investing in better observability to alert when job drops occur. These efforts are focused on preventing similar incidents and improving overall reliability going forward.

1764959935 - 1764973219 Resolved

Incident with Copilot

On November 28th, 2025, between approximately 05:51 and 08:04 UTC, Copilot experienced an outage affecting the Claude Sonnet 4.5 model. Users attempting to use this model received an HTTP 400 error, resulting in 4.6% of total chat requests during this timeframe failing. Other models were not impacted.The issue was caused by a misconfiguration deployed to an internal service which made Claude Sonnet 4.5 unavailable. The problem was identified and mitigated by reverting the change. GitHub is working to improve cross-service deploy safeguards and monitoring to prevent similar incidents in the future.

1764313141 - 1764318198 Resolved

Disruption with some GitHub services

On November 24, 2025, between 12:15 and 15:04 UTC, Codespaces users encountered connection issues when attempting to create a codespace after choosing the recently released VS Code Codespaces extension, version 1.18.1. Users were able to downgrade to the 1.18.0 version of the extension during this period to work around this issue. At peak, the error rate was 19% of connection requests. This was caused by mismatching version dependencies for the released VS Code Codespaces extension.The connection issues were mitigated by releasing the VS Code Codespaces extension version 1.18.2 that addressed the issue. Users utilizing version 1.18.1 of the VS Code Codespaces extension are advised to upgrade to version >=1.18.2.We are improving our validation and release process for this extension to ensure functional issues like this are caught before release to customers and to reduce detection and mitigation times for extension issues like this in the future.

1763989853 - 1763996663 Resolved

Disruption with some GitHub services

Between November 20, 2025 17:16 UTC to November, 2025 19:08 UTC some users experienced delayed or failed Git Operations for raw file downloads. On average, the error rate was less than 0.2%. This was due to a sustained increase in unauthenticated repository traffic.We mitigated the incident by applying regional rate limiting and are taking steps to improve our monitoring and time to mitigation for similar issues in the future.

1763661865 - 1763666673 Resolved

Incident with Actions

On November 19, between 17:36 UTC and 18:04 UTC, GitHub Actions service experienced degraded performance that caused excessive latency in queueing and updating workflow runs and job statuses. Operations related to artifacts, cache, job steps and logs also had significantly increased latency. At peak, 67% of workflow jobs queued during that timeframe were impacted, and the median latency for impacted operations increased by up to 35x.This was caused by a significant change in load pattern on Actions Cache-related operations, leading to a saturated shared resource on the backend. The impact was mitigated by mitigating the new load pattern.To reduce the likelihood of a recurrence, we are improving rate-limiting measures in this area to ensure a more consistent experience for all customers. We are also evaluating changes to reduce the scope of impact.

1763574497 - 1763575640 Resolved

Disruption with some GitHub services

Between November 19th, 16:13UTC and November 21st, 12:22UTC, the GitHub Enterprise Importer (GEI) service was in a degraded state, during which time, customers of the service experienced a delay when reclaiming mannequins post-migration.We have taken steps to prevent similar incidents from occurring in the future.

1763568801 - 1763684534 Resolved

Git operation failures

From Nov 18, 2025 20:30 UTC to Nov 18, 2025 21:34 UTC we experienced failures on all Git operations, including both SSH and HTTP Git client interactions, as well as raw file access. These failures also impacted products that rely on Git operations.The root cause was an expired TLS certificate used for internal service-to-service communication. We mitigated the incident by replacing the expired certificate and restarting impacted services. Once those services were restarted we saw a full recovery.We have updated our alerting to cover the expired certificate and are performing an audit of other certificates in this area to ensure they also have the proper alerting and automation before expiration. In parallel, we are accelerating efforts to eliminate our remaining manually managed certificates, ensuring all service-to-service communication is fully automated and aligned with modern security practices.

1763498386 - 1763503161 Resolved

Disruption with some GitHub services

Between November 17, 2025 21:24 UTC and November 18, 2025 00:04 UTC the gists service was degraded and users were unable to create gists via the web UI. 100% of gist creation requests failed with a 404 response. This was due to a change in the web middleware that inadvertently triggered a routing error. We resolved the incident by rolling back the change. We are working on more effective monitoring to reduce the time it takes to detect similar issues and evaluating our testing approach for middleware functionality.

1763420466 - 1763424650 Resolved

Disruption with some GitHub services

From Nov 17, 2025 00:00 UTC to Nov 17, 2025 15:00 UTC Dependabot was hitting a rate limit in GitHub Container Registry (GHCR) and was unable to complete about 57% of jobs.To mitigate the issue we lowered the rate at which Dependabot started jobs and increased the GHCR rate limit.We’re adding new monitors and alerts and looking into more ways to decrease load on GHCR to help prevent this in the future.

1763398338 - 1763406521 Resolved
⮜ Previous Next ⮞