Processing delays to some Issues, Pull Requests and Webhooks


Incident resolved in 1h30m43s

Resolved

On Sep 13, 2024, between 05:03 UTC and 07:13 UTC, the Webhooks and Actions services were degraded resulting in some customers experiencing delayed processing of Webhooks and Actions Runs. 0.5% of Webhook deliveries were delayed more than 2 minutes during the incident. 15% of Actions Runs started between 05:03 and 05:24 UTC saw run start delays or failures. At 05:24 UTC, we implemented a mitigation to shift traffic to healthy infrastructure and new Actions Runs resumed normal operations. During the rest of the incident window, Actions runs started before 05:24 UTC continued to see delays publishing logs or job results. No Actions runs or Webhook deliveries were lost, only delayed.We mitigated the incident by immediately shifting traffic to a healthy cluster while investigating. The incident was caused by an erroneous configuration change on our eventing platform. A permanent fix was deployed at 06:22 UTC after which services began to recover and burn down their backed up queues, with full recovery by 07:13 UTC.We are working to reduce our time to detection and develop test automation to prevent issues like this one in the future.

1726211604

Investigating

We are seeing improvements in telemetry and are monitoring the delivery of delayed Webhooks and Actions job statuses.

1726210173

Investigating

We've applied a mitigation to fix the issues being experienced in some cases with delays to webhook deliveries, and the delayed reporting of the outcome of some running Actions jobs. We are monitoring for full recovery.

1726208637

Investigating

Actions is experiencing degraded performance. We are continuing to investigate.

1726207156

Investigating

We are investigating reports of degraded performance for Issues, Pull Requests and Webhooks

1726206161