Incident with Copilot


Incident resolved in 19h8m40s

Resolved

On July 13, 2024 between 00:01 and 19:27 UTC the Copilot service was degraded. During this time period, Copilot code completions error rate peaked at 1.16% and Copilot Chat error rate peaked at 63%. Between 01:00 and 02:00 UTC we were able to reroute traffic for Chat to bring error rates below 6%. During the time of impact customers would have seen delayed responses, errors, or timeouts during requests. GitHub code scanning autofix jobs were also delayed during this incident. A resource cleanup job was scheduled by Azure OpenAI (AOAI) service early July 13th targeting a resource group thought to only contain unused resources. This resource group unintentionally contained critical, still in use, resources that were then removed. The cleanup job was halted before removing all resources in the resource group. Enough resources remained that GitHub was able to mitigate while resources were reconstructed.We are working with AOAI to ensure mitigation is in place to prevent future impact. In addition, we will improve traffic rerouting processes to reduce time to mitigate in the future.

1720898823

Investigating

Copilot is operating normally.

1720898814

Investigating

Our upstream provider continues to recover and we expect services to return to normal as more progress is made. We will provide another update by 20:00 UTC.

1720893705

Investigating

Our upstream provider is making good progress recovering and we are validating that services are nearing normal operations. We will provide another update by 18:00 UTC.

1720886980

Investigating

Our upstream provider is gradually recovering the service. We will provide another update at 23:00 UTC.

1720869529

Investigating

We are continuing to wait on our upstream provider to see full recovery. We will provide another update at 11:00 UTC

1720842603

Investigating

The error rate for Copilot chat requests remains steady at less than 10%. We are continuing to investigate with our upstream provider.

1720840837

Investigating

Copilot is experiencing degraded performance. We are continuing to investigate.

1720837210

Investigating

We have applied several mitigations to Copilot chat, reducing errors to less than 10% of all chat requests. We are continuing to investigate the issue with our upstream provider.

1720837180

Investigating

Copilot chat is experiencing degraded performance, impacting up to 60% of all chat requests. We are continuing to investigate the issue with our upstream provider.

1720834320

Investigating

Copilot chat is currently experiencing degraded performance, impacting up to 60% of all chat requests. We are investigating the issue.

1720831791

Investigating

Copilot is experiencing degraded availability. We are continuing to investigate.

1720830570

Investigating

Copilot API chat experiencing significant failures to backend services

1720829909

Investigating

We are investigating reports of degraded performance for Copilot

1720829903