Degraded API Performance
Resolved
This incident has been resolved.
Update
We've scaled up our systems and applied fixes to our API. Everything should be operational now.
Update
We are scaling up our systems to handle the increased traffic
Update
All hosts have completed the restoration process and we are seeing our overall Corrosion cluster health and performance return to normal.
Machine API and GraphQL API error rates are improving, but some users may still see elevated rates of request timeouts and/or 504 errors when using the Machines API or Flyctl commands. We are continuing to monitor these services as they recover.
Update
The restore process has completed on the majority of hosts in our fleet and we are seeing overall Corrosion cluster health and performance return to normal.
There are a small number of hosts that are still being worked on, we aim to have them restored shortly.
Update
We are running a restoration and reseed process to bring the Corrosion cluster back to a healthy, current state. During this restoration process, you may see elevated error rates on machines or apps that have been recently updated.
Update
The updates have been applied, however we are still not seeing recovery on all Corrosion nodes. We are continuing to work on a fix.
The machines API and proxy performance remains in a degraded state, especially with newly created and updated machines.
Update
The Machines API issues stem from a propagation delay in our global state store, Corrosion.
We have completed deploying a configuration change to our Corrosion cluster and will be applying these changes to each node shortly. We expect improvement once the changes are applied.
In the meantime users may still see degraded machines API and proxy performance, especially with newly created machines
Update
The issue has been identified and a fix is being implemented.
Investigating
We are investigating degraded API performance