Incident History

Machines failing to start in DFW

This incident has been resolved. Machine creates in DFW continue to work normally.

1773827890 - 1773859995 Resolved

Sprites Operations: 401 errors for certain organizations

This incident has been resolved.

1773496550 - 1773497121 Resolved

Sprite Operations: 401 errors for certain organizations

Organizations with numerical prefixes might experience failing sprite operations ( like creating a sprite, listing sprites, etc... ) due to 401 errors

1773495191 Ongoing

Setting secrets and creating apps is degraded

This incident has been resolved.

1773220783 - 1773229030 Resolved

Private networking issues in SYD region

This incident has been resolved.

1772894550 - 1772898998 Resolved

Routing issues in NA regions

This incident has been resolved. Due to a BGP issue, we saw some North American traffic routed to edges in Singapore (sin). Users in North America would have seen additional request latency during this period.

1772738669 - 1772740259 Resolved

Elevated GraphQL API errors

This incident was caused by a failed Redis node that powers our GraphQL API. We were able to recreate the Redis node and restore service.

We are still investigating the root cause of the failure. In the mean time, all API endpoints now appear to be stable and errors have dropped to baseline level.

1772569104 - 1772572548 Resolved

Cost Explorer fails to load

This incident has been resolved.

1772535017 - 1772539818 Resolved

Certificates issues affecting API and proxy

Between 19:54 and 20:06 UTC, our Vault cluster serving app certificates was unavailable. This caused various API requests to fail, mainly operations on certificates but also app creates and IP assignments.

As the failure mode was Vault requests hanging rather than failing immediately, TLS requests through fly-proxy for domains where the certificate was not cached on the local node remained open for a long time while proxy attempted to fetch the certificate; this caused some connections to fail as too many connection slots were taken up by requests waiting on Vault.

The root cause of this incident was a partially completed update to the Vault cluster. We will be implementing safeguards in the proxy for this failure mode, as well as improving certificate storage longer-term.

1772503515 - 1772503515 Resolved

Issues with the Machines API

This incident has been resolved.

1772486378 - 1772488219 Resolved
⮜ Previous Next ⮞