Between 9:40 to 10:55 AM UTC, Cloud Logs service briefly experienced issue with providing data to query rate metrics only. You may experience gaps in the results for query rates panel in the billing dashboards for the given period. The situation is now mitigated, we apologize for the inconvenience.
From 18:15 to 18:25 UTC, our prod-us-west-0 Loki cell experienced a period of degraded write performance. The issue resolved quickly without requiring manual intervention, and the system has remained stable since.
The Tempo service on cluster EU west experienced a traffic increase over the weekend, which caused an elevated error rate in Tempo's write path (ingestion). Our engineering team identified the root cause of the issue, and implemented measurements for palliating and resolving the problem.
Traces ingestion problems could have been experienced from 15:30 UTC on 13th until 19:30 UTC on 15th.
At approximately 12:00 UTC a feature toggle was rolled out which negatively impacted instances on the slow release channel. Users on this release channel began to receive an "AlertStatesDataLayer" error. A workaround was quickly identified and applied to reporting users. The feature toggle in question was fully reverted by 18:00 UTC.
Due to scheduled maintenance (https://status.grafana.com/incidents/rz7nt6cs4prb) we hit an issue with some users being unable to log in into their Grafana Cloud stacks. The issue was affecting only users who:
had no session already opened in the Grafana Cloud stack;
or they were located close to Europe (geographically), but their stack is closer to US (or vice versa).
The issue was caused by an incorrect configuration introduced by the maintenance, which was fixed shortly after being discovered. Login is fully operational and stable now.