Amazon Cloud Outage KOs Reddit, Foursquare & Others

12 comments

Amazon Web Services is experiencing another outage, and as in the case with earlier incidents, the cloud computing giant’s performance woes are rippling across the Internet. Amazon is attributing the problems to a “small number’ of storage volumes in a single availability zone located in its outage-plagued US-East-1 region. But the impact is being felt in a large fashion by the many prominent sites, including the leading social news site Reddit, location-based social network Foursquare and cloud infrastructure provider Heroku. Other AWS customers reporting problems included

The outage was the fifth significant downtime in the last 18 months for the US-East-1 region, which is Amazon’s oldest availability zone. The US-East-1 region had outages in April 2011March 2012, and June 15 and June 30 of 2012. Amazon’s U.S East region also was hit by a series of four outages in a single week in 2010.

UPDATE: As of 4:25 p.m., it appears some sites that had been down are recovering, including Reddit and Heroku.

Today’s issues began at about 10:30 am Pacific time, according to Amazon.

“We are currently experiencing degraded performance for EBS volumes in a single Availability Zone in the US-EAST-1 Region,” Amazon reports in its status dashboard. “New launches for EBS backed instances are failing and instances using affected EBS volumes will experience degraded performance.” The AWS Relational Database Service (RDS) is also being affected.

Despite Amazon’s official status update that the problems are limited to a single availability zone, multiple AWS users are reporting on Hacker News that their sites are offline even though they have been architected to use multiple availability zones.

Amazon said some users attempting to shift workloads to unaffected zones may have been unable to do so. “Customers can launch replacement instances in the unaffected availability zones but may experience elevated launch latencies or receive ResourceLimitExceeded errors on their API calls, which are being issued to manage load on the system during recovery,” the dashbaord said. “Customers receiving this error can retry failed requests.”

About the Author

Rich Miller is the founder and editor-in-chief of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

Add Your Comments

  • (will not be published)

12 Comments

  1. Jim

    Stinks. I have to work now.

  2. This kind of downtime is totally unwarranted since there are plenty of technologies that allow anyone utilizing AWS to deal with outages with "Failover" by not putting all your servers in one facility and being able to switch to other datacenters in real time. Also this is the second major outage in the N. Virginia Datacenter this year yet all these companies are stacking most of their services there with no high availability? No failover? Come on. http://www.brandonholtsclaw.com/blog/2012/how-not-to-fail-at-the-cloud/ http://benjaminkerensa.com/2012/06/30/reflecting-on-netflix-instagram-pinterest-downtime

  3. m

    Ben, If you read the article you would have known that "sites are offline even though they have been architected to use multiple availability zones". This also included the AWS console and other AWS services. Even though many had "failover" many did not work as intended. The API broke so your "failover" and "high availability" won't work.

  4. Here is a helpful tool available to the public free for realtime health status of Amazon Elastic Compute Cloud EC2 in all regions http://www.systemswatch.com Good site to know quickly if its you are AWS.

  5. Although I never wish downtime for any competitor, I would however love to offer Joyent Cloud as either a secondary or replacement Cloud partner. "Easy button" migration services available. Scott Herson Joyent Cloud

  6. Our app (Break the Ice!) is down which means ZERO new downloads. Really hurts the little guys when this happens.

  7. Great point Benjamin. There is no reason for these outages if AWS would manage its' failover capabilities and not oversubscribe its' services with little or no customer service response times. I still can't believe companies the magnitude of Foursquare and Reddit are on Amazon Cloud. Smaller companies, I understand, due mainly to cost, but once you "graduate" to the next level, the service should be guaranteed uptime 100%.

  8. The article didn't mention how long the systems were down. With respect to cooling, we have a solution for uninterrupted cooling in the event of a power failure which allows ride through cooling for a duration of time directly to the rack level, especially for highly critical racks extending redundancy and safety during an outage.

  9. Once again, clients needs to be diversified and not "put all their eggs in one basket."!!!

  10. AJ

    The cloud will evaporate as did the mainframe

  11. @Don! I concure..There is absolutley no reason for any outtages for a company as big as AWS. United Layer in San Francisco,CA runs a Tier 3 raised flr 40,000sq ft. facility, and we have never experienced downtime whatsoever! 24/7/365 high touch managed services comes in handy for the end user when they arent in close proximity to the DC. Our backup power and redundancy we take extremely seriously!