Amazon Cloud Outage KOs Reddit, Foursquare & Others

Amazon Web Services is experiencing another outage, and as in the case with earlier incidents, the cloud computing giant's performance woes are rippling across the Internet. Amazon is attributing the problems to a "small number' of storage volumes in a single availability zone located in its outage-plagued US-East-1 region. But the impact is being felt in a large fashion by the many prominent sites, including the leading social news site Reddit, location-based social network Foursquare and cloud infrastructure provider Heroku. Other AWS customers reporting problems included

The outage was the fifth significant downtime in the last 18 months for the US-East-1 region, which is Amazon’s oldest availability zone. The US-East-1 region had outages in April 2011, March 2012, and June 15 and June 30 of 2012. Amazon’s U.S East region also was hit by a series of four outages in a single week in 2010.

UPDATE: As of 4:25 p.m., it appears some sites that had been down are recovering, including Reddit and Heroku.

Today's issues began at about 10:30 am Pacific time, according to Amazon.

"We are currently experiencing degraded performance for EBS volumes in a single Availability Zone in the US-EAST-1 Region," Amazon reports in its status dashboard. "New launches for EBS backed instances are failing and instances using affected EBS volumes will experience degraded performance." The AWS Relational Database Service (RDS) is also being affected.

Despite Amazon's official status update that the problems are limited to a single availability zone, multiple AWS users are reporting on Hacker News that their sites are offline even though they have been architected to use multiple availability zones.

Amazon said some users attempting to shift workloads to unaffected zones may have been unable to do so. "Customers can launch replacement instances in the unaffected availability zones but may experience elevated launch latencies or receive ResourceLimitExceeded errors on their API calls, which are being issued to manage load on the system during recovery," the dashbaord said. "Customers receiving this error can retry failed requests."

Comments

Plain text