The connections have been stable since 14:50 UTC and all 4 probes operating normally during that time. Transitioning this incident from monitoring to resolved.
Today from 20:15-21:30 UTC a small sub-set of customers in the prod-us-east-0 region could see missing recording rule samples from this time period. This incident is currently resolved.
The incident has been resolved. The cause was one database server being under heavy CPU load - that caused database queries to either take a long time to complete or fail altogether for Grafana Cloud instances using that specific database server. As an outcome, it made some instances unavailable, and it also made new instance startups fail.