Multiple products impacted with data delays


Incident resolved in 49h54m23s

Resolved

Backfills for Metrics and Log Management data have completed. All systems are back to normal.

1761144011

Update

We are making progress on outstanding backfills. Metrics and Logs backfills are still in progress. For products still undergoing backfilling, queries that include data from the backfilled windows may appear incomplete for the affected subset of customers. We will provide next update no later than Oct 22, 16:00 UTC.

1761124211

Update

We are making progress on outstanding backfills. Cloud Cost Monitoring backfill is complete. Metrics and Logs backfills are still in progress. For products still undergoing backfilling, queries that include data from the backfilled windows may appear incomplete for the affected subset of customers. We will provide next update no later than Oct 22, 10:00 UTC.

1761082530

Update

We are continuing the work on outstanding backfills which are not yet fully complete, during this process queries that include data from the backfilled windows may appear incomplete for the affected subset of customers and products. We will resolve the incident when the backfills are complete or before Oct 21, 22:00 UTC.

1761069896

Update

All products have been stable since the last update. We are continuing the work on outstanding backfills, during this process queries that include data from the backfilled windows may appear incomplete for the affected subset of customers and products. We will resolve the incident when the backfills are complete or before Oct 21, 16:00 UTC.

1761042029

Update

We are seeing recovery across all of our products, and live data and monitor evaluations have resumed for all affected products. Most historical data in Logs has been backfilled and we have a small number of ongoing backfills in Metrics and other products. We will continue to monitor the situation overnight, and our next update will be 09:00 UTC.

1761010337

Update

We are seeing recovery for APM. We continue to see delays in processing that impact the following products: Distribution Metrics, RUM, CCM, and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.

1761006347

Update

Logs data have been backfilled, and users should no longer see gaps in their historical logs. Log Archives and Log Forwarding were paused between 15:00 and 18:30 UTC, and we are working to re-forward any logs from that time period.

We continue to see delays in processing that impact the following products: Distribution Metrics, APM, RUM, CCM, and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.

1761000101

Update

We are seeing recovery in Profiling.

Logs data submitted after 21:30 UTC should be processed normally. Users may see gaps in historical logs prior to 21:30 UTC while our backfill is in progress.

In addition to Log Management we continue to see delays in processing that impacts the following products: Distribution Metrics, APM, RUM, CCM and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.

1761000004

Update

We are seeing recovery in AWS Metrics. Logs data submitted after 21:30 UTC should be processed normally. Users may see gaps in historical logs prior to 21:30 UTC while our backfill is in progress. In addition to Log Management we continue to see delays in processing that impacts the following products: Distribution Metrics, APM, RUM, Profiling, CCM and Product Analytics. As a result of this issue, some users may see only a subset of their data when querying those products or viewing pages that rely on telemetry from those products.

1760996821

Update

We are seeing progress in telemetry data coming from AWS into Datadog. We are starting to see our capacity requests being fulfilled more slowly than usual. App Builder and Workflow Automation are seeing recovery. Our processing is still delayed impacting multiple products - Distribution Metrics, APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well.

1760991246

Update

We are seeing progress in telemetry data coming from AWS into Datadog. Also, we are starting to see our capacity requests being fulfilled. Our processing is still delayed impacting multiple products - Distribution Metrics, APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. App Builder and Workflow Automation are also experiencing elevated errors, as a result customers might not be to query applications and workflows might take longer to execute.

1760986880

Update

APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. We are working on bringing new capacity online and for all products except RUM we expect the data will be backfilled once the service is fully operational again. App Builder and Workflow Automation are also experiencing elevated errors, as a result customers might not be to query applications and workflows might take longer to execute. Due to upstream provider issues, we are also continuing to see unavailability of telemetry data coming from AWS into Datadog.

1760983497

Update

APM, RUM, Log Management, Profiling, CCM and Product Analytics data is still delayed. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. We are working on bringing new capacity online and for all products except RUM we expect the data will be backfilled once the service is fully operational again. App Builder and Workflow Automation are also experiencing elevated errors, as a result customers might not be to query applications and workflows might take longer to execute. Due to upstream provider issues, we are also continuing to see unavailability of telemetry data coming from AWS into Datadog.

1760979904

Update

We are still seeing increased latency processing for those products and the associated monitors are delayed. We are continuing to work on bringing new capacity online and will continue to provide updates on this issue.

1760973511

Update

We are investigating increased latency processing APM, RUM, Log Management and Profiling. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. Monitors using the impacted data are delayed. We are working on bringing new capacity online and will provide an update once the service is fully operational again.

1760969242

Update

We are investigating increased latency processing APM, RUM, Log Management and Profiling. As a result of this issue, some users may see only a subset of their data when querying those different products, other product pages using the same underlying product data will be impacted as well. We are working on bringing new capacity online and the data will be backfilled once the service is fully operational again.

1760964348