From 1:23 UTC to 3:04 UTC, users may have experienced issues with events being stuck or delayed, such as powering on/off, and resizing Droplets in the NYC3, AMS2, BLR1, SGP1, and SYD1 regions. Additionally, Managed Database creates were delayed in all regions.
Our Engineering team has confirmed full resolution of the issue, delayed events and new events should complete as normal now.
Thank you for your patience through this issue. If you continue to experience any issues, please open a support ticket from within your account.
From 13:25 to 14:45 UTC, our Engineering team observed a Networking issue in our NYC3 region. During this time, users may have experienced Droplet and VPC connectivity issue, Users should no longer be experiencing these issues.
We apologize for the inconvenience. If you have any questions or continue to experience issues, please reach out via a Support ticket on your account.
Our Engineering team has confirmed that the issues with Spaces CDN functionality has been fully resolved. Users should now be able to use CDN functionality normally.
If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.
Our Engineering team has confirmed that the issues with Authoritative DNS resolution in NYC1 has been fully resolved. DNS queries should now be resolving normally.
If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.
Our Engineering team has confirmed the full resolution of the issue impacting Managed Database Operations, and all systems are now operating normally. Users may safely resume operations, including upgrades, resizes, forking, and ad-hoc maintenance patches.
If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.
From 16:10 UTC to 22:46 UTC, users may have experienced issues while executing Managed Database CRUD Operations.
Our Engineering team has confirmed the full resolution of the issue impacting Managed Database CRUD Operations, and all systems are now operating normally.
Users may safely resume operations, including upgrades, resizes, forking, and ad-hoc maintenance patches.
If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.
Our Engineering team identified an issue affecting App Platform deployments across all regions. The deploy-on-push functionality may not have triggered new builds between 15:45 and 18:15 UTC.
A fix has been implemented, and as of 18:15 UTC, deployments have been functioning successfully. If you continue to experience problems, please open a ticket with our support team. . We apologize for any inconvenience.
On Thursday, November 28, 2024, our domain registrar, Network Solutions, made an update to our digitaloceanspaces.com domain. The change made at 22:38 UTC (5:38:40PM ET) added an Extensible Provisioning Protocol (EPP) clientHold status code to our domain.
The code prevents DNS resolution until a customer contacts Network Solutions to clear the hold. The impact of the hold resulted in DigitalOcean customers being unable to sign-up/log-in via the Cloud Control Panel at the beginning of the incident and experiencing errors with Spaces buckets and dependent services, such as DigitalOcean Container Registry, App Platform, Functions, Load Balancers, and Mongo Managed Database Backups for the duration of the incident.
Incident Summary
Root Cause: Domain registrar erroneously placed a clientHold on the digitaloceanspaces.com domain, impacting DNS resolution and causing an outage.
Impact: Traffic to DigitalOcean products (e.g. Cloud Control Panel, Spaces, App Platform, etc.) began to see intermittent failures between 11/28 22:38 UTC and 04:03 UTC, with the worst failures happening between 03:00 and 04:03 UTC as DNS cache and TTLs began expiring.
Response: DigitalOcean escalated to the domain registrar, as well as Verisign (as the authoritative domain registry for .com domains), to have the clientHold removed.
Timeline of Events
November 28, 2024 (UTC)
22:38 - Incident declared based on monitoring and customer reports of our Cloud Control Panel not loading.
22:44 - Issue detected by an internal alert
22:57 - 1st customer contact regarding Cloud Control Panel impact
23:57 - We noticed that our whois information for the domain had been updated on November 28 22:38 UTC.
November 29, 2024 (UTC)
01:06 - We identified the clientHold status in the EPP section of our whois for digitaloceanspaces.com and confirmed in our Network Solutions account that the domain had been marked inactive.
01:08 - We initiated contact with Network Solutions to have the clientHold removed.
01:30 - Reached out to Verisign executives to assistance
01:42 - Cloud Control Panel functionality restored.
02:40 - Conference call with Verisign executive to brief them on the issue(s) we have encountered attempting to resolve the clientHold status.
02:44 - Email chain established with Verisign executives. A recap of all work done, problems, and a clear issue resolution request were sent.
02:44 - 04:03 - Multiple escalations between Verisign executives leading to clientHold being removed.
03:00 - As DNS caches and recursive DNS server TTLs began to expire, we saw the most severe service disruptions to our customers until resolution
04:03 - clientHold removed from domain
04:03 - 5:38:40 - Monitoring all infrastructure to ensure healthy and full recovery. DigitalOcean functionality fully restored.
Remediation Actions
DigitalOcean teams are working on multiple types of remediation to prevent a similar incident from happening.
DigitalOcean is working with Network Solutions to understand what happened on their end that resulted in the clientHold being applied to our domain incorrectly.
In addition, we are reviewing other domain registrars as possible new homes for our domains.
Teams are also reviewing our monitoring and alerting to reduce our time to detect incidents related to registrar imposed changes and/or DNS resolution.