Delayed Machine Registration + Token Errors


Incident resolved in 4h10m2s

Resolved

This incident has been resolved.

1770324787

Update

A fix has been deployed across all impacted hosts. We are seeing a sharp reduction in Token errors since 20:00 UTC and other metrics are recovering as well. We are continuing to monitor closely

1770322307

Update

We saw some improvement from the previous fix, however errors remained elevated on some hosts. We have identified the root cause of the remaining errors as a communication issue between the hosts and our Token database. We are preparing a fix that should resolve these.

1770318203

Update

We have rolled out an initial fix for the token issues and are monitoring for improvements.

1770315549

Investigating

While Machine registration error rates have improved, we are now seeing elevated error rates verifying user tokens during some actions.

Users may see errors like "failed to launch VM: permission_denied: bolt token: failed to verify service token: no verified tokens" when deploying or creating machines.

We are investigating

1770313328

Update

A fix has been rolled out and most hosts are registering machines as normal. A few hosts remain with elevated error rates, we are continuing to fix these. Users who experience an error creating or deploying a new machine should re-try the operation.

1770310754

Update

We have identified elevated error rates registering new machines with our global state tracking service on some hosts. We have identified the issue and are deploying a fix.

Users may have seen elevated machine create, start, or deployment failures over the past ~20 minutes.

1770309785