It must be Friday the 13th, because the Amazon Web Services cloud computing platform is having trouble in its US-EAST-1 Region … again. Between 7:04 a.m. and 8:54 a.m. Pacific there were connectivity issues affecting a portion of the instances in a single availability zone.
The company also experienced increased error rates and latencies for the APIs for Elastic Block Storage and increased error rates for EBS-backed instance launches. Impacted instances were unreachable via public IP addresses, but were able to communicate with other instances in the same availability zone using private IP addresses.
The connectivity issues once again impacted major AWS customers Heroku, and parts of and Github, to name but a few. It’s the usual group of sites that are widely used and heavily dependent on AWS US-EAST-1 and suffer downtime during an Amazon cloud outage.
The US East region is the company’s most popular, but also oldest region, in some of its largest and oldest facilities in northern Virginia. Amazon has undergone infrastructure renovations as of late. There have been a number of uptime challenges related to US-EAST over the past several years, ranging from Elastic Block store problems, to a generator failure, a Chrismas Eve outage that took down Netflix, and the big massive general outage of Summer 2012.
It should serve as a reminder, first and foremost, that if you rely heavily on AWS for your infrastructure, have a failover plan. Customers who leverage AWS and have multi-zone, load balancing, and other stop gaps in place usually come out of these outages unscathed. It’s not as simple as it sounds.