Incident History

Spaces, Spaces CDN, DOCR, LBaaS, App Platform, Functions and Mongo Backups.

Incident Summary

On Thursday, November 28, 2024, our domain registrar, Network Solutions, made an update to our digitaloceanspaces.com domain. The change made at 22:38 UTC (5:38:40PM ET) added an Extensible Provisioning Protocol (EPP) clientHold status code to our domain.

The code prevents DNS resolution until a customer contacts Network Solutions to clear the hold. The impact of the hold resulted in DigitalOcean customers being unable to sign-up/log-in via the Cloud Control Panel at the beginning of the incident and experiencing errors with Spaces buckets and dependent services, such as DigitalOcean Container Registry, App Platform, Functions, Load Balancers, and Mongo Managed Database Backups for the duration of the incident. 

Incident Summary

Root Cause: Domain registrar erroneously placed a clientHold on the digitaloceanspaces.com domain, impacting DNS resolution and causing an outage.

Impact: Traffic to DigitalOcean products (e.g. Cloud Control Panel, Spaces, App Platform, etc.) began to see intermittent failures between 11/28 22:38 UTC and 04:03 UTC, with the worst failures happening between 03:00 and 04:03 UTC as DNS cache and TTLs began expiring.

Response: DigitalOcean escalated to the domain registrar, as well as Verisign (as the authoritative domain registry for .com domains), to have the clientHold removed. 

Timeline of Events

November 28, 2024 (UTC)

November 29, 2024 (UTC)

Remediation Actions

DigitalOcean teams are working on multiple types of remediation to prevent a similar incident from happening.

DigitalOcean is working with Network Solutions to understand what happened on their end that resulted in the clientHold being applied to our domain incorrectly.

In addition, we are reviewing other domain registrars as possible new homes for our domains.

Teams are also reviewing our monitoring and alerting to reduce our time to detect incidents related to registrar imposed changes and/or DNS resolution.

Nov 29, 00:21 - Nov 29, 05:35 Resolved

Network Connectivity in SFO3 Region

Our Engineering team identified and resolved the networking issue in our SFO3 region.

From 10.02 UTC to 11:44 UTC, users may have experienced connectivity issues, latency, and timeout errors while interacting with Droplet-based services and App Platform.

The impact has been mitigated and services should be working normally at this time.

If you continue to experience problems, please open a ticket with our support team. Thank you for your patience and we apologize for any inconvenience.

Nov 27, 11:03 - Nov 27, 12:03 Resolved

Block Storage in SFO3

Our Engineering team is investigating an issue related to our ongoing SFO3 maintenance here: https://status.digitalocean.com/incidents/4kj7krrpyg3k

From 20:23 - 20:25 UTC, some services were impacted by a drop in networking. During that time, some Managed Kubernetes clusters experienced errors from the Kubernetes API and/or an increase in 5xx errors. Communication between other services and Block Storage Volumes may have been impacted as well.

The impact has been mitigated and services should be working normally at this time.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

Nov 25, 22:05 - Nov 25, 22:05 Resolved

Droplet creates in NYC3

From 09:05 to 09:50 UTC (November 21), our Engineering team observed an issue with Droplet creation.

During this time, users may have experienced intermittent errors while creating the Droplets via the Cloud Panel and API. Users should no longer be experiencing these issues.

We apologize for the inconvenience. If you have any questions or continue to experience issues, please reach out via a Support ticket on your account.

Nov 22, 17:44 - Nov 22, 17:44 Resolved

Customer Support Ticket Portal

Given the absence of outages for the Support Portal, we will now resolve this incident. Our Engineering team will continue to work with our vendor to ensure continued stability of this service. If we observe further outages, we will communicate those to our users via new updates on our status page.

We sincerely apologize and thank you for your patience as we worked through this issue. In case of any questions or concerns, please open a ticket with our Support team.

Nov 21, 15:40 - Nov 25, 19:20 Resolved

User Creation for PostgreSQL Managed Databases via Cloud Control Panel

Our Engineering team has confirmed that the issue with displaying users for PostgreSQL Managed Databases across all regions has been fully resolved.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

Nov 20, 00:50 - Nov 20, 01:54 Resolved

Spaces Object Storage, DigitalOcean Container Registry & App Platform

Our Engineering team has confirmed complete resolution of the issue that was impacting Spaces Object Storage, DigitalOcean Container Registry, and App Platform, across all regions.

From 16:52 UTC - 18:41 UTC, users may have experienced increased error rates when accessing Spaces objects, interacting with the DigitalOcean Container Registry and while creating/deploying Apps with App Platform. Functionality is completely restored and all operations are succeeding normally.

If you continue to experience any issues with these services please submit a ticket to our customer support team for assistance. Thank you for your patience.

Nov 18, 18:32 - Nov 18, 19:33 Resolved

DNS Resolution in Multiple Regions

Our Engineering team has confirmed that the issues with Authoritative DNS resolution across multiple regions has been fully resolved. DNS queries should now be resolving normally.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

Nov 15, 04:03 - Nov 18, 21:42 Resolved

Authoritative DNS Resolution

From 19:06 - 19:14 UTC, our Engineering team observed an issue impacting Authoritative DNS Resolution globally. During this time, users might have experienced latency or resolution issues while querying DNS records hosted on our authoritative DNS infrastructure.

The Engineering team swiftly identified and resolved the issue, and as of 19:14 UTC, all DNS queries should now be resolving normally.

We apologize for the inconvenience and if you are still experiencing issues or have any additional questions, please open a support ticket from within your account.

Nov 13, 20:44 - Nov 13, 20:44 Resolved

Multiple Services in NYC3

From 20:03 - 20:08 UTC, our Engineering team observed an issue with internal DigitalOcean network connectivity in the NYC3 region. During this time, users might have experienced errors for control plane operations for services in NYC3, such as creating and deleting services, or issuing updates to existing services for products like App Platform Apps, DOKS & Managed Databases.

The incident did not impact public or private network connectivity for customer services.

We apologize for the inconvenience and if you are still experiencing issues or have any additional questions, please open a support ticket from within your account.

Nov 12, 22:06 - Nov 12, 22:06 Resolved
⮜ Previous Next ⮞