Incident History

Networking in NYC region

From 07:45 to 08:55 UTC, users may have experienced networking connectivity issues in our NYC region.

Our Engineering team has confirmed the full resolution of the issue. Users should be able to access all resources as normal.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

1727512149 - 1727521741 Resolved

New Customer Sign Ups

As of 17:05 UTC, our Engineering team has resolved the issue affecting new account sign-ups. Users should no longer experience errors and are now able to complete the sign-up process successfully.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

1727455284 - 1727461005 Resolved

Control Plane in SYD1

From 18:22 UTC to 19:13 UTC, users may have experienced issues or errors when attempting to create or modify DigitalOcean services deployed in the SYD1 region and also when attempting to create or manage Volumes globally.

Our Engineering team has confirmed the full resolution of this issue. If you continue to experience problems, please open a ticket with our support team. Thank you for being so patient, and we apologize for any inconvenience.

1727377698 - 1727383381 Resolved

Network Connectivity in SFO3 Region

Incident Summary

On September 25, 2024 at 22:25 UTC, DigitalOcean experienced a reduction of datacenter capacity in SFO3 and impacted the availability of select DigitalOcean services. Due to a majority of the line cards rebooting at the same time on one of our core routers in SFO3, an inter-regional traffic interruption and traffic drop to the network backbone occurred. This issue impacted users of any DigitalOcean services in the SFO3 region, with a longer impact on select Managed Kubernetes Clusters (DOKS). 

Incident Details

Networking

Specific Impact on DOKS

Timeline of Events

Sep 25 22:21 - Large majority of line cards rebooted on the core router.

Sep 25 22:24 - Line cards became online.

Sep 25 22:25 - Network protocols started session establishment process.

Sep 25 22:30 - Traffic on the affected core router was restored.

Sep 25 22:50 - SFO3 control plane systems all reconnected and recovered. 

Sep 25 23:07 - DOKS API servers degraded.

Sep 25 23:59 - Some DOKS clusters in the SFO3 region could not be scraped. Several nodes were discovered to be in a “not ready” state.

Sep 26 01:40 - All impacted DOKS nodes recycled and clusters are operational. 

Remediation Actions

DigitalOcean teams are working on multiple types of remediation to help prevent a similar incident from happening in the future. 

DigitalOcean is working with the vendor support team for the devices to determine the root cause of the line card crash, as well as upgrading software on the core routers in the SFO3 region.. 

During the incident, engineers had to manually remediate affected nodes across the entire SFO3 DOKS fleet to restore service. Teams are exploring methods to reduce the need for manual action in the future, by increasing thresholds for automated remediation actions, such that service is restored as quickly as possible.

1727307294 - 1727318702 Resolved

Droplet Event Processing and API Availability

Our Engineering team has identified and resolved an issue that impacted the ability to resize Droplets via both the API and UI from 18:15 until 21:55 UTC. During this time, users might have experienced errors when attempting to resize their Droplets through the API or the UI.

Additionally, in an effort to resolve the issue with resizes, a secondary issue affected all event processing and some API calls for Droplets and related services from 21:50 until 22:00 UTC.

Swift action was taken by our Engineering team to restore full functionality, and now everything is operating normally.

We apologize for any inconvenience this may have caused. If you have any questions or continue to experience issues, please reach out via a Support ticket on your account.

1727305177 - 1727305177 Resolved

Resizing MongoDB Managed Databases

From 00:01 to 15:15 UTC, users may have experienced errors when attempting to resize their MongoDB Managed Database clusters.

Our Engineering team has confirmed the full resolution of issue. Users should now be able to resize their MongoDB clusters normally.

If you continue to experience problems, please open a ticket with our support team from within your Cloud Control Panel.

1727277264 - 1727280611 Resolved

App Platform Connectivity

Our Engineering team has confirmed the full resolution of this incident.

From 20:14 to 22:03 UTC, users may have experienced errors when connecting to their Apps or with their Apps connecting to other services, as well as delays in deploying Apps. Users should now be able to connect to their Apps normally, and any delayed deployments should complete successfully.

If you continue to experience problems, please open a ticket with our support team from within your Cloud Control Panel.

1726780027 - 1726786119 Resolved

Network Connectivity in SGP1 Region

From 21:35 to 00:45 UTC, users may have experienced networking connectivity issues in our SGP1 region, which impacted a subset of DigitalOcean services.

Our Engineering team has confirmed the full resolution of the issue. Users should be able to access all resources as normal.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

1726701052 - 1726709701 Resolved

App Platform Deployments & Spaces

From 14:15 to 17:53 UTC, users may have experienced delays for App Platform deployments, as well as latency and errors when fetching Spaces endpoint information and toggling the CDN for a Spaces Bucket, due to an upstream provider issue.

Our Engineering team has confirmed the full resolution of the issue with the upstream provider. Users should now be able to deploy to App Platform and manage their Spaces Buckets as normal.

If you continue to experience problems, please open a ticket with our support team. We apologize for any inconvenience.

1726671727 - 1726691250 Resolved

Connectivity issues to cloud panel, API, and other endpoints

Our Engineering team identified an issue with accessing Cloud Panel and Application deployment which is due to the increased level of HTTP 499 errors rates at Upstream Provider end.

From 04:21 - 06:06 UTC, users may have experienced issues while accessing Cloud Panel and Application deployments.

Our Upstream provider and the Engineering team closely worked together to resolve the issue.

The impact has completely subsided and users should no longer see any issues with the impacted services.

If you continue to experience problems, please open a ticket with our support team from your Cloud Control Panel. Thank you for your patience, and we apologize for any inconvenience.

1726470038 - 1726470038 Resolved
⮜ Previous Next ⮞