Increased API failures
Resolved
This incident has been resolved.
Update
Our internal state is fully re-synchronized, and our metrics are returning to normal. We are continuing to monitor for potential ongoing issues.
Update
Restoration of our state propagation system is complete. The system is now processing updates to re-synchronize back to the latest state. Services and APIs should start to recover once this process is completed.
Update
Our state propagation system is significantly delayed. To speed up recovery, we will restore the system from the snapshot to clear the backlog. Your machine may be missing from fly m list and some other APIs, but all of your started machines will still be running. The state will re-synchronize back to latest once restoration is completed.
Update
We are continuing to work on a fix for this issue.
Update
Parts of our APIs should have resumed normal function. We are still applying a fix to the rest of the APIs.
Update
We are continuing to apply the fix to all hosts in the fleet. Some hosts continue to see elevated API errors at this time.
Update
We are currently in the process of rolling out a fix across our fleet.
Update
We are continuing to work on a fix for this issue. Apps with autostart/autostop configured might also see an increased number of request errors.
Update
We have identified the cause of an increase in API errors across the platform and are working on a fix.