Update: AWS Experienced an Outage in Its US-East 2 Availability Zone

Originally published: Dec. 5, 2022
Last updated: Jan. 25, 2023

Fresh off the firm’s re:Invent conference which hosted 50,000 of the industry’s top pros, Amazon Web Services (AWS) experienced an outage in its US-East 2 region. The outage lasted for exactly 40 minutes from 12:26 p.m. to 1:06 p.m. PST and affected customers using AWS’ site-to-site VPN and Internet Connectivity through US-East 2 availability zone.

Representatives at AWS declined to comment on the cause of the outage and pointed us to their official statement online.

Granted, US-East 2 is just one zone out of the total 96 for AWS, it still underlines the vulnerability of connectivity through the cloud for enterprises. If the world’s largest cloud service provider experiences down time, even for a short period, it affects millions of customers and many more of those customers’ customers.

The knock-on effect could mean the difference between struggle or survival in these tenuous economic times.

What caused the outage at AWS’ US-East 2 Availability Zone?
Have there been other outages of the AWS US-East 2 Availability Zone?
What are ways firms can mitigate the effects of cloud outages?

What caused the outage at AWS’ US-East 2 Availability Zone?

As of press time, AWS has not issued what some call a ‘post-mortem’ or a post-incident summary of the outage. When asked when or if AWS would provide a post-mortem, an official with the firm had this to say:

“We do not publish Post-Event Summaries … for every service event,” wrote an AWS spokesperson to Data Center Knowledge in an email response to our query this week.

“When an issue has broad and significant customer impact that results in the failure of a significant percentage of control plane API calls, impacts a significant percentage of a service’s infrastructure, resources or APIs or is the result of total power failure or significant network failure, AWS is committed to providing a public Post-Event Summary (PES) following the closure of the issue.”

AWS customers whose businesses rely on the firm’s US-East-2 availability zone are left scratching their heads, wondering how to mitigate an issue with no reported cause.

Have there been other outages of the AWS US-East 2 Availability Zone?

Yes. The incident on Dec. 5 is the second one this year in the US-East-2 availability zone for AWS. On July 28, the outage incident was farther-reaching, and one could say it did meet the requirements the AWS representative shared.

For 2.8 hours, AWS customers had no access to 38 of the leading Cloud Service Provider’s services in the US-East-2 availability zone. Those 38 services included: API Gateway, CloudWatch, DynamoDB, and the firm’s flagship offering Elastic Compute Cloud (commonly referred to as EC2).

EC2 was listed as a degradation in services on the AWS Health Dashboard during the time period of the “loss of power” incident.

Currently AWS has published no post-incident summary detailing the cause and mitigation of that issue either.

What are ways firms can mitigate the effects of cloud outages?

This news leads Data Center Knowledge to wonder, how would you handle it if your AWS instance was down. Here are some cloud downtime mitigation techniques we’ve uncovered from Michael Gibbs, CEO, Go Cloud Careers.

Data Center Outage Math (2).png

A single cloud is a single point of failure. Spread workloads across cloud service providers (CSPs) to remain connected during an outage.
Focus on providers with non-proprietary systems. This allows for a less difficult shift of workloads between CSPs. Gibbs gives the example of AWS’s proprietary NoSQL database DynamoDB. A frictionless option, says Bibbs, would be an open NoSQL database such as Apache Casandra or MongoDB. These can be used on every cloud at the same time.
Leverage multi-cloud despite the noise about complexity from CSPs. Running systems in parallel on two cloud systems ensures uptime and profitability for your enterprise. Outages at major CSPs have become too common not to insulate your workloads from down time.
Plan around cloud failures and the CSPs themselves. While enterprises don’t design cloud architecture around brands but rather to solve customer challenges, for business continuity considering brands can’t be avoided says Gibbs.
Business continuity remains in the C-suite, says Gibbs, and isn’t the domain of technology alone. Architects would do well to consider every threat and mitigate it when designing systems for the cloud.

Gibbs’ final advice is to focus on understanding the cloud itself and not the individual brands, whether large or less-known. Keeping this mindset in place allows firms freedom to negotiate and push back against price increases and other decisions by CSPs that don’t serve the enterprise’s interests.

These insights were excerpted from the article, “Reliance on On Cloud Provider Recipe for Disaster” first published on the AFCOM site.

AFCOM is a sister organization of Data Center Knowledge and does not influence the editorial direction of the publication.

[Update, Jan. 25, 2023]: Added information about the reason for the outage and a chart on the costs of downtime.

Comments

Plain text