Incident History

App Platform and Container Registry in NYC

Our Engineering team has confirmed the full resolution of the issue with the DigitalOcean App Platform and Container Registry in our NYC regions.

Users should no longer experience any issues while pushing to Container Registries and working with App Platform builds.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

1722828074 - 1722836920 Resolved

Spaces Access Key

From 23:47 UTC until 01:11 UTC, users may have experienced errors when attempting to create Spaces Access Keys in the Cloud Control Panel.

Our Engineering team has identified and resolved the issue. The impact has been resolved and users should now be able to create Spaces Access Keys.

We apologize for any inconvenience this may have caused. If you have any questions or continue to experience issues, please reach out via a Support ticket on your account.

1722389344 - 1722389344 Resolved

Snapshots and Backups in TOR1

As of 05:05 UTC, our Engineering team has confirmed the full resolution of the issue impacting Snapshot and Backup Images in the TOR1 region. We have verified that the Snapshot and Backup events in the region are processing without any failures.

Users should also be able to create Droplets from Snapshot and Backup images in this region without any issues.

Thank you for your patience and understanding. If you should encounter any further issues at all, then please open a ticket with our Support team.

1722227263 - 1722232332 Resolved

Networking in Multiple Regions

Incident Summary

On July 24, 2024, DigitalOcean experienced downtime from near-simultaneous crashes affecting multiple hypervisors (ref: https://docs.digitalocean.com/glossary/hypervisor/) in several regions. In total, fourteen hypervisors crashed, the majority of which were in the FRA1 and AMS3 regions, the remaining being in LON1, SGP1, and NYC1. A routine kernel fix to improve platform stability was being deployed to a subset of hypervisors across the fleet, and that kernel fix had an unexpected conflict with a separate automated maintenance routine, causing those hypervisors to experience kernel panics and become unresponsive. This led to an interruption in service for customer Droplets, and other Droplet-based services until the affected hypervisors were rebooted and restored to a functional state.

Incident Details

Timeline of Events (UTC)

July 24 22:55 - Rollout of the kernel fix begins. 

July 24 23:10 - First hypervisor crash occurs and the Operations team begins investigating.

July 24 23:55 - Rollout of the kernel fix ends. 

July 25 00:14 - Internal incident response begins, following further crash alerts firing. 

July 25 00:35 - Diagnostic tests are run on impacted hypervisors to gather information.

July 25 00:47 - Kernel panic messages are observed on impacted hypervisors. Additional Engineering teams are paged for investigation.

July 25 01:42 - Operations team begins coordinated effort to reboot all impacted hypervisors to restore customer services.

July 25 01:50 - Root cause for the crashes is determined to be the conflict between the kernel fix and maintenance operation. 

July 25 03:22 - Reboots of all impacted hypervisors complete, all services are restored to normal operation.

Remediation Actions

1721867636 - 1721886497 Resolved

Cloud Control Panel and API

From 17:22 UTC to 17:27 UTC, we experienced an issue with requests to the Cloud Control Panel and API

During that timeframe, users may have experienced an increase in 5xx errors for Cloud/API requests. The issue self-resolved quickly and our Engineering team is continuing to investigate root cause to ensure it does not occur again.

Thank you for your patience, and we apologize for any inconvenience. If you continue to experience any issues, please open a Support ticket for further analysis.

1721762416 - 1721762416 Resolved

App Deployments in SFO3

From 19:33 on to 21:02 UTC on July 18th, App Platform users may have experienced delays when deploying new Apps or when deploying updates to existing Apps in SFO3.

Our engineering team has deployed a fix for this issue. The impact has been resolved and users should no longer see any issues with the impacted services.

If you continue to experience problems, please open a ticket with our support team from your Cloud Control Panel. Thank you for your patience, and we apologize for any inconvenience.

1721338472 - 1721338472 Resolved

Degraded Functions Service in TOR1

Our Engineering team has confirmed the full resolution of the issue impacting the ability to create and manage Functions through the Cloud Control Panel and API in our TOR1 region. We appreciate your patience throughout the process.

If you continue to see errors please open a ticket with our Support team and we will be glad to assist you further.

1721237050 - 1721242195 Resolved

Payments via PayPal

Our Engineering team has resolved the issue with processing payments via PayPal on our platform. Services should now be operating normally. If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

1721124493 - 1721130483 Resolved

Creation of Load Balancers and Mongo Managed Databases

Our Engineering team has completed remediation of the previously stalled Mongo clusters and those clusters are now online.

This incident is now fully resolved. If you have any questions or continue to experience issues, please reach out to Support from within your account.

Thank you for your patience throughout this incident.

1720634522 - 1720643184 Resolved

2FA SMS Code Deliverability Issues

Our Engineering team has confirmed the issue with SMS delivery report delays when sending messages has been fully resolved.

We appreciate your patience throughout this process and if you continue to experience problems, please open a ticket with our support team for further review.

1720564463 - 1720727042 Resolved
⮜ Previous Next ⮞