On April 22, 2026, between 09:00 UTC and 22:05 UTC, the Copilot coding agent and pull request comment event processing were degraded. During this period, approximately 0.5% of total pull request and issue comments mentioned @copilot (~23,000 invocations), explicitly requested work from the Copilot coding agent but were not acted upon.Creating, viewing, and replying to pull request comments was unaffected, and other Copilotfunctionality continued to operate normally. The impact was limited to @copilot mentions on pull request comments not triggering Copilot coding agent runs, and to some downstream systems not receiving new pull request comment events during the impact window.The cause was a serialization error that prevented pull request comment events from being published to downstream consumers, including the Copilot coding agent. This was related to the same class of issue as incident #4295 on April 20, affecting a another event type.We mitigated the incident by deploying a fix that restored event publishing, after which the Copilot coding agent and other downstream consumers resumed processing pull request comment events normally.We are working to complete our audit of related event schemas, migrate remaining consumers to usethe updated identifier fields, and improve monitoring to detect drops in publishing on critical event topics, to reduce our time to detection and mitigation of issues like this one in the future.
On April 22, 2026, between 15:16 UTC and 19:18 UTC, users experienced errors when interacting with Copilot Chat on github.com and Copilot Cloud Agent. During this time, users were unable to use Copilot Chat or Copilot Cloud Agent. Copilot Memory (in preview) was not available to Copilot agent sessions during this time. The issue was caused by an infrastructure configuration change that resulted in connectivity issues with our databases. The team identified the cause and restored connectivity to the database. Copilot Chat and Cloud Agent for github.com were restored by 18:16 UTC. Remaining regional deployments were restored incrementally, with full resolution at 19:18 UTC. We have taken steps to prevent similar infrastructure changes from causing these kinds of database operations in the future.
On April 21, 2026, between 13:35 UTC and 01:24 UTC the following day the projects service was degraded. During this time period, projects may have been out of sync and users may have experienced delays in changes to projects and their items. Delays in reflected changes peaked at approximately 45 minutes. The delays were caused by serialization errors that failed events and triggered a flood of resyncs, overloading our event processing layers.We mitigated the incident by speeding up processing time for incoming changes and otherwise waiting for all changes to be processed.We are working to increase our capacity for processing updates to projects to reduce our time to mitigation of issues like this one in the future.
On April 20, 2026 between 10:28 UTC and 15:04 UTC GitHub experienced degraded service for code scanning default setup, code quality, and project boards. Repair of affected project boards additionally lasted until April 21, 05:04 UTC During this time, code scanning default setup and code quality analyses were not triggered on newly opened pull requests. Additionally, newly created issues were not appearing on project boards. The cause was a serialization error that prevented proper triggering of code scanning, code quality analyses, and project board updates. We mitigated the issue by deploying a fix, restoring event publishing for code scanning and code quality. For project boards, an additional code change was deployed to update event consumers, followed by a reindex of affected project items. We are working to prevent recurrence by strengthening our schema validations and improving monitoring for drops in publishing on critical hydro topics.
On April 17, 2026, between 14:46 UTC and 15:12 UTC, users experienced a degraded web experience on GitHub.com. During this time, approximately 1.5% of web requests resulted in errors, with some users encountering slow page loads or failed requests. The issue was caused by capacity saturation of a caching component in one of our data center regions. We mitigated the issue by redirecting traffic to an unaffected region and rolling back a recent deployment. The incident was fully resolved at 15:18 UTC. We are taking steps to provide appropriate capacity for this caching path to prevent recurrence.
On April 16, 2026 between 09:30 UTC and 17:15 UTC, users experienced failures when attempting to connect to GitHub Codespaces via the VS Code editor. During this time, approximately 40% of codespace start operations failed. Users connecting via SSH were not impacted. The issue was caused by a failure in an upstream download service that prevented the VS Code Server from being retrieved during codespace startup. The impact was mitigated by implementing a workaround to use an alternative download path when the primary endpoint is degraded. We are working with the upstream dependency to address the root cause of the download service failure, and we are improving our fallback mechanisms to reduce the impact of similar upstream failures in the future.
On April 14, between 00:58 UTC and 06:08 UTC, GitHub Enterprise Cloud customers experienced 500 errors when attempting to access Copilot Insights pages which was caused by an authentication failure in our metrics pipeline. We fully mitigated the issue and validated the fix in production. Approximately 709 users were impacted. The total impact duration was approximately 5 hours and 10 minutes. Our investigation determined the incident was caused by a change in a tenant credential which caused authentication errors to retrieve the required data needed on our Copilot Insights pages. We understand this disruption impacted customers' ability to access the Copilot Insights page. To prevent similar issues and reduce resolution time in the future, we are investing in improved diagnostics tooling to quickly identify the root cause of failures, enhanced monitoring, and alerting to detect issues at a more granular level. GitHub is a critical infrastructure for your work, your teams, and your businesses. We are focused on these remediations and continued reliability improvements for Copilot Insights and related metrics experiences.
On Sunday April 13th, 2026, between 18:53 UTC and 20:30 UTC, the GitHub Pages service experienced elevated error rates. On average, the error rate was 10.58% and peaked at 12.77% of requests to the service, resulting in approximately 17.5 million failed requests returning HTTP 500 errors. This was due to an automated DNS management tool (octodns) erroneously deleting a DNS record for a Pages backend storage host after its upstream data source intermittently failed to return the record, causing the tool to treat it as stale and remove it.We mitigated the incident by re-creating the deleted DNS record. To prevent future incidents, we are implementing availability-zone-tolerant routing in the Pages frontend so that an unresolvable backend host triggers failover to healthy hosts rather than returning errors, adding safeguards to prevent automated deletion of DNS records owned by other systems, and improving logging and alerting for DNS resolution failures in the Pages serving path.
On April 13, 2026, between 14:41 UTC and 17:29 UTC, the Copilot service experienced degraded performance. All Copilot users were impacted by increased latency, and approximately 20% experienced request failures when interacting with Copilot Cloud Agent (CCA). On average, request latency increased to approximately 950ms. The GitHub User Dashboard also displayed intermittent errors loading Copilot quota information. CCA and the User Dashboard were impacted for approximately 2 hours and 56 minutes. This was due to an infrastructure change that reduced the available compute capacity for a backend service responsible for Copilot rate limiting and quota management. The reduced capacity caused resource exhaustion under normal traffic load, leading to cascading failures in downstream request processing. We mitigated the incident by increasing compute resources allocated to the affected service and scaling out the number of service instances to distribute load more effectively. We are working to improve proactive capacity monitoring to detect resource degradation before it impacts users, reviewing retry and timeout configurations across dependent services to reduce amplification during degraded states, and evaluating connection management strategies to improve resilience under constrained resources.
On April 9, 2026, between 22:59 UTC and April 10, 2026, 13:24 UTC, the Copilot Mission Control service was degraded and did not display Claude and Codex Cloud Agent sessions in the agents tab dashboard. Customers were unable to see, list, or manage their third party agent sessions during this period. The underlying agent sessions continued to function normally. This was a visibility and management issue only, and no HTTP errors were generated. The API returned successful responses with incomplete results, with an average error rate of 0% and a maximum error rate of 0%. This was due to a code change that introduced a filter which inadvertently excluded third party agent sessions.We mitigated the incident by reverting the problematic code change and deploying the fix to production.We are working to add automated monitoring for dashboard content visibility and improve integration test coverage for third party agent session listing to reduce our time to detection and mitigation of issues like this one in the future.