Increased API failures

Incident resolved in 8h16m16s

Resolved

This incident has been resolved.

1729650155

Update

Our internal state is fully re-synchronized, and our metrics are returning to normal. We are continuing to monitor for potential ongoing issues.

1729647032

Update

Restoration of our state propagation system is complete. The system is now processing updates to re-synchronize back to the latest state. Services and APIs should start to recover once this process is completed.

1729642032

Update

Our state propagation system is significantly delayed. To speed up recovery, we will restore the system from the snapshot to clear the backlog. Your machine may be missing from fly m list and some other APIs, but all of your started machines will still be running. The state will re-synchronize back to latest once restoration is completed.

1729638483

Update

We are continuing to work on a fix for this issue.

1729636033

Update

Parts of our APIs should have resumed normal function. We are still applying a fix to the rest of the APIs.

1729631712

Update

We are continuing to apply the fix to all hosts in the fleet. Some hosts continue to see elevated API errors at this time.

1729628880

Update

We are currently in the process of rolling out a fix across our fleet.

1729625114

Update

We are continuing to work on a fix for this issue. Apps with autostart/autostop configured might also see an increased number of request errors.

1729621145

Update

We have identified the cause of an increase in API errors across the platform and are working on a fix.

1729620379