Grafana
Grafana
Last checked: Mar 10, 2026 08:27
Incident History (Last 30 Days)
IRM Scheduled Maintenance
THIS IS A SCHEDULED EVENT Mar 12, 23:00 UTC - Mar 13, 00:00 UTC
Mar 9, 23:35 UTC
Scheduled - Within this window there is expected to be 1-2 minutes of database connectivity loss within the prod-us-central-0 and prod-eu-west-0 regions.
During that time we will be buffering alerts into IRM which will be processed after the DB switchover. During that few minutes time UI and API will be unavailable.
Mar 12, 2026
23:00
Metrics write path outage in prod-us-central-0 and prod-us-central-5
Mar 9, 18:03 UTC
Monitoring - From 15:30 to 15:45 UTC and from 16:53 to 17:03 UTC, the prod-us-central-0 and prod-us-central-5 regions saw elevated latency and error rates on the write path.
We're monitoring now.
Mar 09, 2026
18:05
Maintenance - Auth API Database Restart
Mar 9, 17:15 UTC
Completed - The scheduled maintenance has been completed.
Mar 9, 17:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Mar 3, 20:32 UTC
Scheduled - We will be performing a restart of all Auth API databases in AWS as part of planned maintenance.
Because the Auth API is a dependency for all Grafana Cloud services, this maintenance has the potential to impact all Grafana Cloud environments within the regions being restarted. However, the only expected user-facing impact during each restart window is for customers attempting to manage Grafana Cloud access policies.
Each database …
Mar 09, 2026
17:15
Fleet Managment Elevanted Rate of Errors
Mar 9, 14:20 UTC
Investigating - Some users in prod-us-central-0 may be seeing elevated rate of errors when fetching configurations. Our engineers are currently investigating this issue.
Mar 09, 2026
14:20
Outage for prod-eu-central-0 due to AWS S3 outage.
Mar 9, 08:59 UTC
Resolved - This incident has been resolved.
Mar 8, 11:30 UTC
Monitoring - Since about 20:03 UTC we have seen AWS S3 recover and also our services are recovering, we are monitoring.
Mar 7, 20:10 UTC
Update - Since about 20:03 UTC we have seen AWS S3 recover and also our services are recovering, we are monitoring.
Mar 7, 20:10 UTC
Update - We are continuing to investigate this issue.
Mar 7, 20:07 UTC
Investigating - We are seeing elevated errors rate and outages across many of our services in prod-eu-central-0, due to an on-going AWS S3 outage in that region.
Mar 09, 2026
08:59
Service degradation on Logs Read path in AWS US West (us-west-0)
Mar 8, 20:31 UTC
Resolved - We continue to observe a continued period of stability since 19:40 UTC. At this time, we are considering this issue resolved
Mar 8, 18:29 UTC
Monitoring - Since 16:35 UTC we have experienced stability and services are recovering. We are actively monitoring and working to fully stabilize
Mar 8, 14:17 UTC
Investigating - Our engineering team is investigating issues on the read path of Loki services on AWS US West since today aroun ~13:25UTC.
These issues can cause timeouts and 5xx errors when query logs for customers on the cluster. The team is currently working to restore the service.
Mar 08, 2026
20:31
Grafana Cloud Metrics - Intermittent Write Latency in prod-us-central, prod-us-central-5, and prod-eu-west-0
Mar 6, 21:44 UTC
Update - We are rolling out a mitigation across the environments in these regions, and preemptively where possible to ensure it doesn’t spread elsewhere.
Mar 6, 20:53 UTC
Update - We have seen an increase in latency in our cloud providers services, and are rolling out a change to mitigate the issue. We are monitoring.
Mar 5, 22:22 UTC
Update - We are continuing to investigate this issue alongside the CSP, and have taken steps to escalate through the appropriate channels. The mitigation in place continues to work as expected, and any notable updates will continue to be shared here for …
Mar 06, 2026
21:44
Some Grafana Instances Unavailable
Mar 6, 16:31 UTC
Resolved - This incident has been resolved.
Mar 6, 15:03 UTC
Identified - We have identified an issue which is causing some instances to become unavailable. Our engineering team is actively working on mitigating the issue.
We will continue to share updates as they become available.
Mar 06, 2026
16:31
Write failures in prod-eu-west-0
Mar 5, 23:36 UTC
Resolved - We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
Mar 5, 22:41 UTC
Monitoring - Engineering has released a fix and as of 22:00 UTC, customers should no longer experience write failures and delays in rule evaluation.
We will continue to monitor for recurrence and provide updates accordingly.
Mar 5, 22:27 UTC
Investigating - A recent incident affecting the data read path and rule execution within prod-eu-west-0 began at ~21:05 UTC on March 5, 2026. Customers with instances in this region may experience write failures and delays …
Mar 05, 2026
23:36
Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)
Mar 5, 18:31 UTC
Resolved - This incident has been resolved.
Mar 4, 22:41 UTC
Update - We continue to monitor mitigation efforts and work with our CSP.
Mar 3, 22:19 UTC
Identified - The impacted has been reduced to slight intermittency. We continue to work with our CSP to reach a complete resolution.
Mar 3, 14:15 UTC
Update - Since today at 11:55 UTC time we are seeing issues on the write path for Loki in cluster Azure Netherlands (eu-west-3). Impact will reflect in degradation of logs ingestion on that cluster. We are also reporting impact to Faro performance in the same region. Our engineering team …
Mar 05, 2026
18:31
Complete outage in prod-me-central-1
Mar 4, 22:22 UTC
Update - We are actively monitoring the situation, but at this time there are no new updates to share. The next update will be provided once we have more information to share. Please reach out to our Support team if you have any questions.
Mar 4, 10:28 UTC
Update - We are continuing to investigate this issue.
Mar 2, 22:18 UTC
Update - Please continue to refer to the AWS status page for more detailed updates specific to AWS.
https://health.aws.amazon.com/health/status
AWS are recommending that affected customers move workloads to alternate regions, and we are recommending the same.
Customers who are impacted and who cannot …
Mar 04, 2026
22:22
Elevated rate of errors for Fleet Management in prod-us-central-0
Mar 4, 09:29 UTC
Resolved - This incident has been resolved.
Mar 4, 08:46 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Mar 4, 07:47 UTC
Investigating - We are currently experiencing an issue with Fleet Management in prod-us-central-0. Users in prod-us-central-0 may observe elevated rate of errors when fetching configurations.
Mar 04, 2026
09:29
Maintenance - Auth API Database Restart
Mar 3, 19:00 UTC
Completed - The scheduled maintenance has been completed.
Mar 3, 17:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 24, 16:01 UTC
Scheduled - Upon successful completion of the first two regions restarted as part of our planned maintenance, we will proceed with restarting the remaining regions in AWS on March 3rd, 2026.
Because the Auth API is a dependency for all Grafana Cloud services, this maintenance has the potential to impact Grafana Cloud environments within the regions being restarted. However, the only expected user-facing impact during each restart window is for customers …
Mar 03, 2026
19:00
Test Run Browser Screenshot Upload Failing
Mar 3, 13:00 UTC
Resolved - Test run browser screenshot upload experienced failures from 13:12 to 14:51 UTC.
The issue has been resolved
Mar 03, 2026
13:00
Maintenance - Auth API Database Restart
Mar 2, 17:15 UTC
Completed - The scheduled maintenance has been completed.
Mar 2, 17:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 23, 18:49 UTC
Scheduled - We will be performing a restart of all Auth API databases in AWS as part of planned maintenance. To minimize risk, we will begin by restarting two regions and once successful, proceed with the remaining databases on March 3rd, 2026.
Because the Auth API is a dependency for all Grafana Cloud services, this maintenance has the potential to impact all Grafana Cloud environments within the regions being restarted. However, …
Mar 02, 2026
17:15
Write outage for logs in prod-eu-west-3
Mar 2, 15:48 UTC
Resolved - This incident has been resolved.
Mar 2, 08:08 UTC
Update - We are now experiencing write outage for logs in prod-eu-west-3. Our Engineering team is aware and currently investigating this. We will provide further updates accordingly.
Mar 2, 07:37 UTC
Investigating - We are experiencing increased write latency for logs in prod-eu-west-3. Our Engineering team is aware and currently investigating this. We will provide further updates accordingly.
Mar 02, 2026
15:48
Trace querying issue in all Tempo clusters
Feb 27, 23:38 UTC
Resolved - This incident has been resolved.
Feb 27, 19:27 UTC
Identified - Our team has identified the issue, and are in the process of testing a fix.
Feb 27, 13:46 UTC
Investigating - We're currently working on an issue where portions of data may be temporarily unretrievable, affecting a small percentage of tenants in all Tempo clusters.
Feb 27, 2026
23:38
Increased Latency for Small Subset of Customers
Feb 27, 16:25 UTC
Resolved - A recent rollout caused the AuthZ (RBAC) service to perform many redundant folder-tree fetches for each authorization check. For a small number of tenants in the prod-us-east-0 and prod-eu-west-2 regions with very large folder trees. This added a few milliseconds to every check, which increased request latency.
The approximate timeframe of the impact is:
2026-02-26 17:24:43 UTC to 2026-02-27 14:33:53 UTC.
This has now been resolved.
Feb 27, 2026
16:25
Incorrect pipeline assignment after custom attributes are assigned
Feb 27, 15:24 UTC
Resolved - This incident has been resolved.
Feb 27, 13:39 UTC
Identified - The issue has been identified and we are working on a fix.
Feb 27, 12:57 UTC
Investigating - We are investigating issues with incorrect pipeline assignment after custom attributes are assigned.
Feb 27, 2026
15:24
Grafana Cloud Faro slowness of listing and uploading sourcemaps in all regions.
Feb 27, 02:49 UTC
Resolved - This incident has been resolved.
Feb 26, 14:43 UTC
Update - Uploads should work without an issue now. However, listing might still result in occasional timeouts - we're actively addressing this problem.
Feb 26, 13:00 UTC
Identified - We're experiencing an issue for all Grafana Cloud regions, which manifest in slowness when uploading and listing sourcemaps. The issue most significantly affects users who have a large sourcemap files.
We've identified the issue and our team is currently working on a fix.
Feb 27, 2026
02:49
Issues Loading Dashboards and Alert Folders in Hosted Grafana
Feb 25, 19:51 UTC
Resolved - This incident has been resolved.
Feb 25, 18:46 UTC
Monitoring - A fix has been implemented, and we are observing recovery across all impacted regions. We will continue to monitor progress.
Feb 25, 18:31 UTC
Identified - The issue has been identified, and we are in the process of rolling out a fix.
Feb 25, 17:49 UTC
Update - While we work on narrowing down the scope, we can confirm that deployments in the prod-us-east-0 region are impacted.
Feb 25, 17:44 UTC
Investigating - Some users may be experiencing issues loading dashboard and alert folders in Hosted Grafana. We will provide more information …
Feb 25, 2026
19:51
Partial Write & Rule Evaluation Outage in prod-eu-west-3
Feb 25, 17:20 UTC
Resolved - This incident has been resolved.
Feb 25, 15:55 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 25, 15:05 UTC
Investigating - We are currently investigating an issue which is causing a partial write, and rule evaluation outage in the specified region. We will continue to provide updates as they are available
Feb 25, 2026
17:20
Grafana Cloud Traces prod-eu-west-6 region (AWS Ireland) wrong URL endpoint shown for traces ingestion.
Feb 25, 15:05 UTC
Resolved - This incident has been resolved.
Feb 25, 12:53 UTC
Monitoring - The fix was deployed to all affected, already existing tenants. All newly created tenants will not face the issue as well. We're monitoring the incident, but it should be resolved by now.
Feb 25, 12:41 UTC
Identified - We identified an issue with the incorrect URL endpoint being shown for traces ingestion in prod-eu-west-6 region (AWS Ireland). Using the displayed URL will result in traces not being able to be ingested. The AWS private link ingestion should work without issues though.
The issue affects all tenants in this region …
Feb 25, 2026
15:05
Some Alert Rule Evaluations Failing
Feb 24, 17:09 UTC
Resolved - This incident has been resolved.
Feb 24, 16:27 UTC
Monitoring - A fix has been implemented, and we are monitoring results.
Feb 24, 14:31 UTC
Investigating - We are currently investigating an issue impacting a subset of users in the prod-us-east-0 region. Impacted customers will receive a "failed to execute query" error when evaluating alert rules.
Feb 24, 2026
17:09
Maintenance task for Synthetic Monitoring ProbeFailedExecutionsTooHigh alert rule
Feb 20, 14:17 UTC
Completed - The scheduled maintenance has been completed.
Feb 20, 13:30 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 20, 10:34 UTC
Scheduled - Alert instances for Synthetic Monitoring ProbeFailedExecutionsTooHigh provisioned alert rule that are firing during the maintenance might resolve and fire again in the next evaluation.
Feb 20, 2026
14:17
Maintenance task for Synthetic Monitoring ProbeFailedExecutionsTooHigh alert rule
Feb 19, 13:34 UTC
Completed - The scheduled maintenance has been completed.
Feb 19, 13:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 19, 09:46 UTC
Scheduled - Possible user impact:
Alert instances for Synthetic Monitoring ProbeFailedExecutionsTooHigh provisioned alert rule that are firing during the maintenance might resolve and fire again in the next evaluation.
Feb 19, 2026
13:34
Degraded performance of Grafana Cloud k6 test runs
Feb 18, 21:17 UTC
Resolved - This incident has been resolved.
Feb 18, 12:57 UTC
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 18, 10:20 UTC
Update - We are continuing to investigate this issue.
Feb 18, 08:27 UTC
Investigating - We see intermittent failures and slow start-up of test-runs. We are currently investigating this issue.
Feb 18, 2026
21:17
Brief Disruption in Azure prod-us-7-central
Feb 18, 14:00 UTC
Resolved - We experienced an issue impacting a cell within the Azure prod-us-central-7 region, which occurred between 14:26 and 14:36. Affected users may have noticed increased errors with rule evaluations, as well as a some read/write errors. We have resolved this issue, and will continue to monitor.
Feb 18, 2026
14:00
Maintenance task for Synthetic Monitoring ProbeFailedExecutionsTooHigh alert rule
Feb 18, 13:33 UTC
Completed - The scheduled maintenance has been completed.
Feb 18, 13:00 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 18, 10:04 UTC
Scheduled - Alert instances for Synthetic Monitoring ProbeFailedExecutionsTooHigh provisioned alert rule that are firing during the maintenance might resolve and fire again in the next evaluation. Only API is affected.
Estimated time window is 13:00–14:00 UTC
Impacted clusters are:
prod-me-central-1
prod-us-east-1
prod-ap-northeast-0
prod-gb-south-0
prod-us-east-3
prod-eu-central-0
prod-ap-south-1
prod-sa-east-1
Feb 18, 2026
13:33
Grafana Cloud metrics degredation
Feb 18, 05:31 UTC
Resolved - This incident has been resolved.
Feb 18, 03:47 UTC
Update - We are continuing to investigate this issue.
Feb 18, 03:43 UTC
Investigating - We've been alerted to issues querying and are investigating
Feb 18, 2026
05:31
Maintenance task for Synthetic Monitoring ProbeFailedExecutionsTooHigh alert rule
Feb 17, 16:27 UTC
Resolved - This incident has been resolved.
Feb 17, 14:53 UTC
Monitoring - Alert instances for Synthetic Monitoring ProbeFailedExecutionsTooHigh provisioned alert rule that are firing during this maintenance might resolve and fire again in the next evaluation. Only the API is affected.
Estimated time window is 15:00–16:00 UTC
Impacted clusters are:
prod-eu-west-5
prod-us-east-4
prod-eu-west-6
prod-sa-east-0
prod-ap-south-0
prod-ap-southeast-0
prod-me-central-0
prod-au-southeast-0
prod-ap-southeast-2
Feb 17, 2026
16:27
Degradation of service on Synthetic Monitoring Public Probe AWS Canada (Calgary)
Feb 17, 12:47 UTC
Resolved - There was a service degradation today from ~12:09 UTC until ~12:35 UTC on the Public Probe of Calgary for Synthetic Monitoring. Impact may include SM check fails where the probe was used.
Feb 17, 2026
12:47
Self-Serve Users Unable to Sign Up
Feb 13, 19:08 UTC
Resolved - This incident has been resolved.
Feb 13, 18:40 UTC
Investigating - We are currently investigating an issue which is causing users the inability to sign up for self-serve Grafana. We will continue to update with more information as we progress our investigation.
Feb 13, 2026
19:08
Loki Delete Endpoint Bug
Feb 13, 17:03 UTC
Resolved - This incident has been resolved.
Feb 13, 07:56 UTC
Update - We are continuing to work on a fix for this issue.
Feb 13, 07:56 UTC
Update - A fix is being made to mitigate the issue. We will provide further updates accordingly.
Feb 12, 23:16 UTC
Identified - As of 22:45 UTC, we have identified a serious bug affecting the delete endpoint for all Loki regions. As a precaution, the endpoint has been temporarily disabled.
Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Feb 13, 2026
17:03
Loki writes outage in prod-ca-east-0
Feb 13, 07:29 UTC
Resolved - We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.
Feb 13, 07:09 UTC
Monitoring - We have scaled up to handle the increased traffic and are seeing marked improvement. We will continue to monitor and provide updates.
Feb 13, 06:59 UTC
Investigating - We have been alerted to an ongoing Loki writes outage in the prod-ca-east-0 region. Our Engineering team is actively investigating this.
Feb 13, 2026
07:29
Essential Maintenance for Faro Services
Feb 12, 19:07 UTC
Resolved - This incident has been resolved.
Feb 12, 16:09 UTC
Monitoring - We are undergoing essential maintenance for Faro services. Users may experience a short outage for the service of <1 minute during this time. We expect this to be finished within an hour.
Feb 12, 2026
19:07
Grafana Cloud Metrics elevated write and rule evaluation latency in prod-eu-west-2 region.
Feb 12, 14:30 UTC
Resolved - We no longer observed any problems with our services - this incident has been resolved.
Feb 12, 12:50 UTC
Monitoring - The fix has been implemented and services are back to normal. We're currently monitoring health of the services before resolving this incident.
Feb 12, 12:40 UTC
Identified - The issue has been identified and our team is currently working on a fix.
Feb 12, 12:33 UTC
Investigating - Since 12:17 UTC, we're observing an increased latency for data ingestion and rule evaluation in Grafana Cloud Metrics, prod-eu-west-2 region. We're currently investigating the issue.
Feb 12, 2026
14:30
Unable to Install Slack Integration
Feb 11, 21:47 UTC
Resolved - This incident has been resolved.
Feb 11, 18:20 UTC
Monitoring - We are in the process of rolling out the fix.
Feb 11, 16:22 UTC
Identified - We have identified the issue, and are working on a fix.
Feb 11, 14:21 UTC
Investigating - We are aware of an issue that is preventing the installation of the Slack integration. We are currently investigating this, and will provide updates as they become available.
Feb 11, 2026
21:47
Loki error response rate spike on prod-ap-southeast-1
Feb 11, 07:25 UTC
Resolved - This incident has been resolved.
Feb 11, 06:54 UTC
Monitoring - We have deployed temporary measures to mitigate the issue, but there was a writes outage from 06:26 to 06:37 UTC.
Feb 11, 06:51 UTC
Investigating - cloud logging is facing write issues in this region, our team is looking into this.
Feb 11, 2026
07:25
Write failures in prod-us-central-0
Feb 10, 01:45 UTC
Resolved - We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.
Feb 10, 00:39 UTC
Investigating - As of 00:10, we are currently experiencing write failures in a single cell affecting customers in prod-us-central-0. Impacted customers may see failed or dropped writes.
Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Feb 10, 2026
01:45
Athena Queries Broken
Feb 9, 19:07 UTC
Resolved - This incident has been resolved.
Feb 9, 17:01 UTC
Monitoring - We are seeing recovery in impacted environments. We will continue to monitor the progress.
Feb 9, 16:23 UTC
Update - Our engineering team is still investigating this issue.
Feb 9, 15:35 UTC
Investigating - We are currently investigating an issue resulting in broken queries for the Athena data source.
Feb 09, 2026
19:07
Grafana Cloud Logs – Write Ingestion Degradation
Feb 9, 11:21 UTC
Resolved - This incident has been resolved.
Feb 9, 10:36 UTC
Update - We are continuing to monitor for any further issues.
Feb 9, 10:32 UTC
Monitoring - Between 09:47 and 10:14 UTC, Grafana Cloud Logs within a single cell residing in the prod-ap-southeast-1 region experienced an issue affecting write ingestion only. During this time, some log writes may have failed or been delayed. Log reads were not impacted and remained fully available throughout the incident.
Our engineering team quickly identified the cause of the issue and are monitoring the service. The service has been operating normally since 10:14 UTC.
Feb 09, 2026
11:21