Date of Incident: 2025-07-14 Time of Incident (UTC): 22:00UTC - 11:23UTC Service(s) Affected: SSO Impact Duration: 1 hour and 23 minutes
Summary
On July 14, 2025 1Password’s SSO services were intermittently slow or unavailable for approximately 1 hour and 23 minutes. This was due to an outage with an upstream provider which 1Password backend services rely on to verify DNS lookups for SSO authentication. You can read the provider’s incident report here, which outlines in detail what occurred and how they remediated the issue.
For the duration of the incident 1Password customers would have experienced failed or slow SSO login attempts. We apologize for any inconvenience this issue may have caused our users.
SSO Login: Users logging in via SSO experienced failures, timeout errors or slow responses
Scope
Number of affected Customers (approximate): Most users attempting to authenticate using SSO
Geographic Regions Affected: All
What Happened?
At 22:00UTC on July 14, 2025 1Password engineers were alerted to customer timeouts and failures when trying to use SSO as an authentication method. After diagnosing the issue, engineers discovered the source of the issue was an outage with an upstream provider when looking up and verifying OIDC provider hostnames.
Timeline of Events (UTC):
22:00 - Incident initially detected by 1Password
22:13 - Upstream provider identifies issue
22:59 - Upstream provider implements a fix
23:30 - Incident resolved
Root Cause Analysis: The third party provider 1Password uses for DNS lookup and verification failed.
How Was It Resolved?
Resolution Steps: Our third party provider mitigated the issue.
Verification of Resolution: We closely monitored systems to ensure SSO logins were successful.
What We Are Doing to Prevent Future Incidents
We are investigating the addition of an additional DNS provider to provide redundancy.
Incident Postmortem - Intermittent 500 error failures for US customers.
Date of Incident: 2025-06-16 Time of Incident (UTC): 14:39 UTC - 17:47 UTC Service(s) Affected: Web Interface, Sign in, Sign up, Admin console, Item Sync, SSO (Single Sign On), Command Line Interface (CLI)) for US based users.
Impact Duration: 68 minutes
Summary
At 14:39 UTC, users in the US region experienced intermittent errors while accessing 1Password. The issue stemmed from resource constraints within our infrastructure, specifically affecting the networking services. This was resolved by scaling up the affected services.
Impact on Customers
During the duration of the incident:
Web interface, Admin Console: Customers were able to log in but saw intermittent 500 errors, including “Failed to get Integrations” on the web Interface.
SSO (Single Sign On), Command Line Interface (CLI), Item Sync: There was degraded performance for authentication and API requests.
Sign in, Sign up: There were intermittent failures on sign in and sign up for some customers during the incident.
Number of Affected Customers (approximate): All users accessing the service in the US region were affected.
Geographic Regions Affected (if applicable): US
What Happened?
The incident began when our internal services started returning errors after deploying the latest version of the 1Password service. As part of the initial investigation, we restarted a supporting network service within our infrastructure, which resulted in an initial recovery of the affected service.
2025-06-16 15:21 UTC: Networking updates are rolled out
2025-06-16 15:23 UTC: Initial service recovery observed
2025-06-16 15:38 UTC: Root cause identified: Networking applications ran out of allocated resources.
2025-06-16 15:42 UTC: Additional capacity added to networking applications
2025-06-17 17:47 UTC: The spike in server errors stopped, and internal monitoring showed that system health had returned to normal.
2025-06-17 17:53 UTC: Incident resolved
Root Cause Analysis: An internal service that directs network traffic became resource constrained which caused degraded performance of the service. We first stabilized the system by adding more capacity and have since deployed a permanent fix by increasing system resources to prevent a recurrence.
How Was It Resolved?
Mitigation Steps: As an immediate mitigation, the number of replicas for the deployment was scaled up.
Resolution Steps: A more permanent fix was later applied by increasing the allocated resources for the networking applications.
Verification of Resolution: Around 15:25 UTC, we observed that the spike in 500 errors from the server had completely stopped. The team continued monitoring the errors and confirmed at 17:53 pm EST that allocated resource consumption had been stable for a while.
What We Are Doing to Prevent Future Incidents
Scale existing resources: We have effectively scaled resources and resource limits to address additional load and will implement monitoring to ensure we do not hit critical limits
Review and expand existing monitors: We will review our critical service monitors to improve alerting and catch future incidents earlier, before they have customer impact.
Next Steps and Communication
No action is needed from customers
We are committed to providing a reliable and stable service, and we are taking the necessary steps to learn from this event and prevent it from happening again. Thank you for your understanding.
We have resolved an issue where account administrators are unable to load the "People" view when logging in to their account on 1Password.com, 1Password.ca, and 1Password.eu.