Date of Incident: 2025-09-03 Time of Incident (UTC): 11:06 - 12:07 Service(s) Affected: All APIs Impact Duration: 61 minutes
Summary
For 61 minutes on the morning of September 3rd, 2025, all 1Password APIs in the US/Global environment had degraded performance or returned an error for approximately 20% of requests. 92% of the impact was mitigated within 13 minutes at 11:19 by automation scaling up infrastructure. By 12:06 a manual restart of the remaining infrastructure completed mitigation. A permanent fix was implemented and deployed to prevent the issue from reoccurring.
Impact on Customers
APIs: High latency, or a 500 Internal Server Error.
Number of Affected Customers: 20% of all requests returned errors for 13 minutes, 1% thereafter.
Geographic Regions Affected (if applicable): 1Password USA/Global
What Happened?
Timeline of Events (UTC):
11:05: A customer started a stream of an unusually high volume of requests to an API with sub-optimal performance.
11:06: Some servers started consuming abnormally high memory, causing slow response times and high error rates.
11:19: Automation scaled up infrastructure to service additional load
11:51: Engineers declare an incident and alert response teams
12:02: Response team begins restarting affected servers.
12:07: All servers completed restarts, and error rates returned to normal levels
Root Cause Analysis: A poorly performing cache operation was triggered repeatedly in a short period of time across multiple servers, leading directly to greatly delayed responses.
How Was It Resolved?
Mitigation Steps: Automatic instance scaling restored over 98% of operational capacity after 13 minutes. Full capacity was restored through manual intervention
Resolution Steps: We refactored the poorly performing query.
Verification of Resolution: We tested the affected API to confirm refactoring of query produced the desired performance improvement. We deployed the fix and monitored it for 24 hours to assert the issue was resolved.
What We Are Doing to Prevent Future Incidents
We are auditing services for sub-optimal query performance.
Next Steps and Communication
No action is required from our customers at this time.
We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.
Service(s) Affected: Sign-in, Web Application, Command Line Interface (CLI), Single Sign On (SSO), API’s
Impact Duration: 1 hour
Summary
On August 5, 2025, 1Password experienced a service degradation that impacted customers' ability to sign in and access the web application. The incident was triggered during a planned architectural improvement when a misconfigured rollback attempt caused an overload of traffic and a subsequent database connection bottleneck. The issue was resolved by correcting the misconfiguration and restarting the web application servers, fully restoring service.
Impact on Customers
During the service disruption, some customers experienced degraded performance when accessing their 1Password vaults and signing in.
Sign-in Issues: Customers may have experienced sign-in slowness or timeouts.
Error Messages: Customers may have seen error messages when attempting to sign in, such as "Can't sign in", "Failed to determine sign in methods for email", or "Upstream connect error".
Vault Access: Some customers experienced degraded performance when accessing their 1Password vaults.
Geographic Regions Affected: USA/Global
What Happened?
The incident was part of ongoing improvements to the 1Password infrastructure and was not the result of a security incident. Customer data was not affected.
Timeline of Events (UTC):
17:12: A planned, phased rollout of an architectural improvement to authentication systems begins.
18:20: Engineers monitoring the rollout begin to observe higher latency during sign-in for a small subset of accounts.
20:17: A rollback of the change is initiated. An error in the rollback configuration sends a high volume of traffic to the new code path, causing a database connection bottleneck. Engineers observing the deployment immediately observe service impact.
20:21: A corrective action is deployed to revert the system to its previous state before the misconfigured rollback.
20:39: While impact is still being observed, a failover from the primary database to a secondary database is initiated. This action has no effect.
20:58: A restart of the service that manages incoming traffic to our services is initiated to reset connections.
21:13: A rolling restart of the web application servers is initiated.
21:20: Service is fully restored for all customers.
Root Cause Analysis: The root cause was an error in the configuration of an attempted rollback. This misconfiguration incorrectly routed a high volume of sign-in traffic through a new, slower code path, which created a bottleneck of connections to our primary database and made the web application unresponsive.
How Was It Resolved?
Resolution Steps: The issue was fully resolved through two key actions:
The rollback misconfiguration was identified and corrected, which stopped traffic from flowing to the problematic new code.
A rolling restart of the web application servers was performed to clear the backlog of stuck database connections.
Verification of Resolution: Monitoring systems were closely observed for 30 minutes to ensure error rates returned to normal.
What We Are Doing to Prevent Future Incidents
We are working to implement the following improvements:
Improve configuration testing: We will improve testing procedures of configuration updates and their rollbacks prior to being pushed to production.
Improve our deployment tooling: We will add additional validation to our traffic management tools to prevent similar configuration errors.
Review our incident response procedures: We have updated the runbook used to respond to this type of incident with guidance that will enable faster recovery.
Enhance our monitors: We will add more specific alerts that will help us more quickly distinguish between different application tiers, allowing for faster diagnosis and time to resolution.
Next Steps and Communication
No action is required from our customers. 1Password applications are designed to be resilient, with local copies of vault data always available on customer devices, even without a connection to the 1Password service.
If you are still experiencing issues, please contact our support team.
We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.
Date of Incident: 2025-07-22 Time of Incident (UTC): 10:40 - 06:30 Services Affected: Admin Console Activity log and User Details page Impact Duration: 43 hrs
Summary
A change to our internal data model that removed an unused type definition led to multiple failures in the Reports system.
Impact on Customers
Events API and Reporting: The activity log widget failed to display activities.
Admin console: User details page failed to display user details
What Happened?
A software update introduced an API change that caused a mismatch between client and server. The Audit log page in the admin console was still collecting data but was not able to render it.
Timeline of Events (UTC):
10:40 : Detected by 1Password personnel and root cause identified
10:42: Issue Identified
13:38 Fix created and testing initiated
06:30: Fix released, Service fully restored
Root Cause Analysis: A User State was removed from the 1Password.com server but not the client. This was a breaking API change.
How Was It Resolved?
Resolution Steps: The user state in the 1Password client was removed to get the server and client back into parity.
Verification of Resolution: 1Password engineering tested the changes and validated that full system functionality was restored.
What We Are Doing to Prevent Future Incidents
User State change process: 1Password is investigating how to catch breaking API changes by implementing additional end-to-end tests in the CI pipeline.
Audit log architecture change: Engineering is investigating a rework of the audit log page to change how it aggregates data so that it is not reliant on an endpoint. This would mitigate future occurrences that caused this issue.
Next Steps and Communication
No action is required from our customers at this time.
If you are still experiencing issues, please contact our support team at support@1password.com.
We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.
Date of Incident: 2025-07-14 Time of Incident (UTC): 22:00UTC - 11:23UTC Service(s) Affected: SSO Impact Duration: 1 hour and 23 minutes
Summary
On July 14, 2025 1Password’s SSO services were intermittently slow or unavailable for approximately 1 hour and 23 minutes. This was due to an outage with an upstream provider which 1Password backend services rely on to verify DNS lookups for SSO authentication. You can read the provider’s incident report here, which outlines in detail what occurred and how they remediated the issue.
For the duration of the incident 1Password customers would have experienced failed or slow SSO login attempts. We apologize for any inconvenience this issue may have caused our users.
SSO Login: Users logging in via SSO experienced failures, timeout errors or slow responses
Scope
Number of affected Customers (approximate): Most users attempting to authenticate using SSO
Geographic Regions Affected: All
What Happened?
At 22:00UTC on July 14, 2025 1Password engineers were alerted to customer timeouts and failures when trying to use SSO as an authentication method. After diagnosing the issue, engineers discovered the source of the issue was an outage with an upstream provider when looking up and verifying OIDC provider hostnames.
Timeline of Events (UTC):
22:00 - Incident initially detected by 1Password
22:13 - Upstream provider identifies issue
22:59 - Upstream provider implements a fix
23:30 - Incident resolved
Root Cause Analysis: The third party provider 1Password uses for DNS lookup and verification failed.
How Was It Resolved?
Resolution Steps: Our third party provider mitigated the issue.
Verification of Resolution: We closely monitored systems to ensure SSO logins were successful.
What We Are Doing to Prevent Future Incidents
We are investigating the addition of an additional DNS provider to provide redundancy.