Incident History

Item Usage Metrics Not Updating in Real Time

We've completed mitigation work and item usage metrics appear to be updating in real time. We will continue to closely monitor performance to ensure the issue is fully resolved.

Elevated Errors and Latency Affecting Authentication

This incident has been resolved.

Some Trelica users experiencing issues

The upstream provider has reported full service recovery and Trelica is fully operational. This incident has been resolved.

Some MSP users are unable to access Managed Companies

This incident has been resolved.

Degraded Performance When Sharing Items

This incident has been resolved.

Intermittent performance issues with 1Password.com

Incident Postmortem - 1Password Cloud Services Degraded

Date of Incident: 2025-10-20
Time of Incident (UTC): 07:26:00 - 20:55:00
Service(s) Affected: 1Password.com website, Sign in, Access to passwords and other items
Impact Duration: 13 hours, 29 minutes

Summary

On October 20, 2025 at 07:26:00 UTC, 1Password.com faced intermittent latency, authentication failures, and degraded service availability due to a major outage at AWS in the us-east-1 region. This was not a security incident and no customer data was affected.

As a result, the 1Password server-side application experienced degradation or intermittent failures, affecting up to 50% of traffic in the US region. Complete service restoration occurred in conjunction with AWS’s final mitigations around 18:30 UTC.

Impact on Customers

All US customers accessing 1Password cloud services experienced intermittent latency, authentication failures, and degraded availability on 1Password.com.

File Share: Sharing of passwords via links could intermittently fail
Login: Users logging into vaults experienced timeout errors and slow responses
Web Access: Users accessing their vault through the web interface experienced timeout errors and slow responses
API Access: CLI users and API requests received timeout errors and slow responses

What Happened?

At 07:11:00 UTC, AWS began experiencing DNS resolution failures in the us-east-1 region, initially affecting DynamoDB and rapidly cascading to multiple AWS services. 1Password monitoring detected impact at 07:26:00 UTC when monitoring alerts fired for inability to scale up clusters, and an incident was declared.

1Password immediately deployed mitigations inside our infrastructure to ensure there was adequate compute capacity to serve our US-based users, which included pausing deployments and scaling down any services not critical to key functionality for our users.

Timeline of Events (UTC):

06:55:05 - 1Password monitoring triggers warning for unavailable Pods in Deployment (caused by inability to obtain AWS IAM credentials)
07:03:06 - 1Password monitoring alerts for 5xx errors on auth start endpoint (caused by inability to obtain AWS IAM credentials) - pages authentication team, but alert recovers within minutes
07:26:00 - 1Password monitoring alerts for inability to scale clusters, engineers begin investigating, Incident declared
07:26:41- AWS confirms elevated error rates across multiple services
07:49:06 - 1Password monitoring alerts for 5xx errors on auth start endpoint (caused by inability to obtain AWS IAM credentials)
07:51:09 - AWS identifies DNS as the root cause, begins mitigation
08:02:13 - 1Password suspends auto-scaling tooling to retain existing capacity
09:27:33 - AWS reports significant recovery signs
10:35:37 - AWS declares DNS issue fully mitigated, services recovering
14:14:00-15:43:00 - AWS announced full recovery across all services; throttles EC2 launches
16:42:49 - 1Password tooling and users start reporting 503s and inability to login due to volume of traffic
16:50:00 - 1Password services restarted to reset and flush connections, prioritizing post-recovery traffic.
20:53:00 - AWS resolves their incident
20:55:00 - 1Password engineers overscale deployments for stability and overnight observation
Oct 21, 2025 - Incident resolved after confirmation of complete upstream recovery

How Was It Resolved?

Mitigation Steps: 1Password paused deployments and auto-management of cluster capacity to ensure enough capacity was available to serve users through peak access times. As demand outstripped available capacity, 1Password engineering reset the circuit breaker to allow additional connections to the service.
Resolution Steps: AWS announced system restoration and a reduction in throttling of EC2 API calls. To ensure sufficient capacity for peak traffic, 1Password engineers updated the required number of pods for core services the following business day and resumed auto-management of cluster capacity tooling. The following day, 1Password engineers resumed verification of the health of the systems, deployments, and auto-scaling of the services.
Verification of Resolution: Engineers observed monitoring systems and cluster management tooling logs to ensure system health.

Root Cause Analysis

Root Cause Analysis: The failures in AWS's internal network affected multiple AWS product APIs. This disruption directly impacted 1Password’s ability to scale up infrastructure, deploy applications, and retrieve configuration data.
Contributing Factors (if any):
- Third-party incident response services and paging services were affected by the AWS incident, which complicated communications.
- Upstream customer IDPs were affected by the AWS outage, and returned errors that resulted in authentication failures.

What We Are Doing to Prevent Future Incidents

Improve Incident Response: Create additional backup protocols for when our incident response tooling is unavailable.
Improve multi-service outage response: Create strong break-glass runbooks in the event of a multi-service cloud provider outage.

Next Steps and Communication

No action is required from our customers at this time.

We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.

Sincerely,

The 1Password Team

Degraded performance when accessing 1Password

Incident Postmortem - Degraded performance when accessing 1Password

Date of Incident: 2025-09-26
Time of Incident: 4:20pm UTC - 5:39pm UTC
Service(s) Affected: SSO, Web Sign In, Sign Up, Web Interface, CLI
Impact Duration: ~60 minutes

Summary

On September 26, 2025 at 4:20 UTC 1Password’s web interface and APIs experienced degraded performance for all customers in the US region. This was not a result of a security incident and customer data was not affected.

Impact on Customers

During the duration of the incident:

Web interface, Administration: Customers experienced delays when accessing the 1Password web interface.
Single Sign-on (SSO), Multi-factor Authentication (MFA): Users with SSO or MFA enabled experienced delays, and in some cases failures to login.
Command Line Interface (CLI): CLI users faced increased latency and timeouts when attempting to access our web APIs.
Browser Extension: Users requiring web interface authentication experienced delays or failures.
Number of Affected Customers (approximate): ~30%
Geographic Regions Affected: 1password.com (US/Global)

What Happened?

At 4:20PM UTC and 5 PM UTC There were traffic bursts which caused extra load on one of our caches. This cache was under-provisioned to handle that spike of activity, which resulted in it exhausting available CPU. This caused cascading errors/latency which manifested in slow and failed requests.

Timeline of Events (UTC):
- 2025-09-26 4:20pm: Spike in customer traffic began
- 2025-09-26 4:29pm: Automated monitoring detects increased errors and latency
- 2025-09-26 4:35pm: The team activates our incident protocol and begins investigation
- 2025-09-26 4:58pm: The team decides to restart application servers
- 2025-09-26 5:00pm: The servers have been restarted, service is still degraded, as a second traffic burst begins
- 2025-09-26 5:18pm: Service starts to improve
- 2025-09-26 5:25pm: The team detects increased load for the second time
- 2025-09-26 5:33pm: The team restarts application servers again
- 2025-09-26 5:39pm: Service is back to normal, team continues to investigate
- 2025-09-26 7:26pm: Team has found the issue, and proceeds to upgrade cache instance size
- 2025-09-26 7:49pm: Cache upgrade completed successfully
- 2025-09-26 7:50pm: Team continues to monitor, performance has returned to nominal levels
- 2025-09-26 8:24pm: Incident is marked as resolved
Root Cause Analysis:

A code library installed in July introduced latency issues for cache connections. Authentication operations weren't properly rate-limited, allowing large traffic influxes. During peak traffic periods, the cache infrastructure was operating near maximum CPU capacity. The incident occurred when a burst of authentication traffic pushed the cache CPU utilization to 100%. The increased latency and CPU usage together directly caused the incident.
Contributing Factors:
- Latency increase due to cache library version upgrade
- Inadequate rate limiting allowed traffic bursts to go unchecked
- Cache instance size is under-provisioned

How Was It Resolved?

Mitigation Steps: Restarting application servers temporarily mitigated the latency and errors, but the problems returned when traffic spiked again.
Resolution Steps: Increasing the instance size for the cache resolved the issue.
Verification of Resolution: The incident team tested the upgrade in a staging deployment before executing it in production. They then monitored metrics to confirm the system returned to normal levels.

What We Are Doing to Prevent Future Incidents

Improve capacity planning for cache: We will ensure our internal infrastructure is properly sized to handle current traffic volumes and accommodate future growth. We'll implement regular resource evaluations to maintain adequate capacity as our traffic increases. We will also implement proactive alerting systems that notify our teams when resource utilization approaches critical thresholds.
Update library to a more performant version: We will upgrade our caching library to the latest stable version to eliminate the current latency issues.
Improve rate limiting for operations that triggered the traffic burst: Enhancing our rate limiting system will significantly improve our ability to handle future traffic bursts.
Timeline for Implementation: Observability improvements have already been implemented, and we will complete the remaining work by the end of Q1, 2026.

Next Steps and Communication

No action is required from our customers at this time.

‌

We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.

Sincerely,

The 1Password Team

Some users are unable to interact with the admin console

This incident has been resolved.

Some users are unable to interact with the admin console

Incident Postmortem - Some customers are unable to interact with the admin console

Date of Incident: 2025-09-24

Time of Incident (UTC): 02:27 - 17:16

Service(s) Affected: Admin console, Sign in

Impact Duration: 36:49

Summary

Some customers with certain account configurations were placed on a blocklist and presented with a 403 error page after accessing the admin console.

Impact on Customers

Admin console: Affected customers were presented with a 403 error page whenever they tried to interact with any of the admin console pages.
Log in: Affected customers were also unable to log in to the application.
Number of Affected Customers (approximate): 515
Geographic Regions Affected (if applicable): All regions

What Happened?

Timeline of Events (UTC):
- Sep 24th 2:27am: Spike in application monitoring alerted engineers to increased rates of IP blocking
- Sep 24th 3:00am: Cause identified as a change to requests in the application, which had been partially rolled out via a feature flag.
- Sep 24th 4:03am: The feature flag was enabled to all customers which reduced the spike, but IP blocks continued throughout the day.
- Sep 24th 10:03pm: Merged an application change to revert the change to prevent the issue reoccurring.
- Sep 25th 5:03pm: The change was deployed with scheduled application release, error rate dropped off shortly after.
Root Cause Analysis: The issue was caused by GET requests to the Users API exceeding the URL length limit due to a recent change to append a list of UUIDs to the request parameters to resolve customer reported performance issues.
Contributing factors:
- Requests were switched from GET to POST to prevent requests from exceeding the URL limit, however an issue with the feature flag configuration was causing UUIDs to be sent with the GET endpoint.
- An underlying issue with the feature flag not resolving as expected in the application.

How Was It Resolved?

Mitigation Steps: Customers were manually removed from the blocklist at multiple points in time as we evaluated the root cause and worked to patch the root issue.
Resolution Steps: The issue was mitigated by removing UUIDs at the API level if a GET request is used. Additional logging has been added to identify the root cause of the feature flag configuration issue.
Verification of Resolution: We monitored our server logs to ensure that we did not observe any additional GET requests to the affected URL.

What We Are Doing to Prevent Future Incidents

Audit additional admin console API requests: We’re performing a sweep of admin console API requests to ensure the utilization of POST requests with highly parameterized URLs.
Remove the feature flag misconfiguration: We’re correcting the way the feature flag is configured to ensure consistent outcomes.

Next Steps and Communication

No action is required from our customers at this time.

‌

We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.

Sincerely,

The 1Password Team

Provisioning invite links are not working

Provisioning Invites could not be accepted

Date of Incident: 2025-09-23
Time of Incident (UTC): 17:18 - 00:46
Service(s) Affected: Sign Up
Impact Duration: 7h 28m

Summary

For 7 hours and 28 minutes, 1Password Provisioning invites could not be accepted, presenting to the user as an invite expiry. Invites could not be accepted due to a web browser routing defect that was not caught during development, review, or release. First identified by customer reports approximately two and a half hours after release, the issue was escalated to development teams and an incident was immediately called. The root cause was identified as a defect introduced by a web client modification, and a fix was created, tested, and released. By 00:46 UTC, the fix was deployed to all environments and service was fully restored.

Impact on Customers

Sign-up: Provisioning invites could not be accepted.
Number of Affected Customers (approximate): 100% of provisioning invites could not be accepted
Customer-facing impact: Users clicking their invite links encountered a misleading ‘Invite Expired’ message.
Geographic Regions Affected: 1Password USA/Canada/EU/Enterprise

What Happened?

A change to the web client contained a router defect that incorrectly rendered provisioning invites as expired. Users were presented with an error message that erroneously stated the invite was expired. The change responsible for introducing the defect was able to be released because it was not captured under automatic change notification rules, was lacking automated test coverage, and was not included in the set of manual tests.

Timeline of Events (UTC):
- 17:18: 1Password Release containing defect
- 19:53 (2 hours, 35 minutes later) First customer report
- 20:49 (56 minutes later) Escalation to developer teams
- 21:01: (12 minutes later) Incident called
- 22:02: (1 hour, 1 minute later) Root cause identified
- 22:29: (27 minutes later) Fix created and testing initiated
- 23:49: (1 hour, 30 minutes later) Fix merged
- 00:46: (57 minutes later) Fix released and service fully restored
Root Cause Analysis: A change modified the order in which key provisioning web routes were rendered. As a result, the route handling provisioning invitations failed to use the correct query parameters and the invite rendered as expired.
Contributing Factors: Automated tests on this endpoint do not exist. Manual testing missed testing the Provisioning routes. The modified code was not covered by automatic change notification rules to notify the Provisioning team. An existing bug that can fail the resending of invites was an initial red herring during the investigation.

How Was It Resolved?

Resolution Steps: The defect in the 1Password web client was corrected so provisioning invites would render correctly.
Verification of Resolution: 1Password engineering tested the changes and validated that the functionality was restored, as well as verifying that requests for the affected endpoints were successful after the fix was deployed.

What We Are Doing to Prevent Future Incidents

Improve automated tests: We are enhancing our automated tests for the Provisioning Invite routes.
Expand automatic change notifications: Expanding coverage of automatic change notification rules for areas of code owned by the Provisioning team.

Next Steps and Communication

No action is required from our customers at this time. Existing invites do not need to be resent and may be accepted.
If you are still experiencing issues, please contact our support team at support@1password.com.

We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.

Sincerely,

The 1Password Team

⮜ Previous