Machines failing to start in DFW
This incident has been resolved. Machine creates in DFW continue to work normally.
This incident has been resolved. Machine creates in DFW continue to work normally.
This incident has been resolved.
Organizations with numerical prefixes might experience failing sprite operations ( like creating a sprite, listing sprites, etc... ) due to 401 errors
This incident has been resolved.
This incident has been resolved.
This incident has been resolved. Due to a BGP issue, we saw some North American traffic routed to edges in Singapore (sin). Users in North America would have seen additional request latency during this period.
This incident was caused by a failed Redis node that powers our GraphQL API. We were able to recreate the Redis node and restore service.
We are still investigating the root cause of the failure. In the mean time, all API endpoints now appear to be stable and errors have dropped to baseline level.
This incident has been resolved.
Between 19:54 and 20:06 UTC, our Vault cluster serving app certificates was unavailable. This caused various API requests to fail, mainly operations on certificates but also app creates and IP assignments.
As the failure mode was Vault requests hanging rather than failing immediately, TLS requests through fly-proxy for domains where the certificate was not cached on the local node remained open for a long time while proxy attempted to fetch the certificate; this caused some connections to fail as too many connection slots were taken up by requests waiting on Vault.
The root cause of this incident was a partially completed update to the Vault cluster. We will be implementing safeguards in the proxy for this failure mode, as well as improving certificate storage longer-term.
This incident has been resolved.