The Aftermath of Amazon’s Cloud Outage

2 comments

More than five days after its outage began, Amazon Web Services has finally restored virtually all of its services, with some mopping up of a small number of customer accounts with “stuck” data in its Elastic Block Storage (EBS) service. “EBS is now operating normally for all APIs and recovered EBS volumes,” Amazon reports on its status dashboard. “The vast majority of affected volumes have now been recovered. We’re in the process of contacting a limited number of customers who have EBS volumes that have not yet recovered and will continue to work hard on restoring these remaining volumes.” The company promises a detailed incident report will follow.

What are the lessons and implications of the outage? Discussion continued over the weekend. Here’s a look at some notable links with analysis and commentary:

  • How SmugMug survived the Amazonpocalypse – SmugMug’s Don MacAskill: “Despite using a lot of Amazon services, SmugMug didn’t go down because we spread across availability zones and designed for failure to begin with, among other things.”SmugMug also didn’t use Elastic Block Storage. “We’ve never felt comfortable with the unpredictable performance and sketchy durability that EBS provides, so we’ve never taken the plunge,” MacAskill writes.
  • Seven lessons to learn from Amazon’s outage – What are the lessons to learn? Phil Wainewright at ZDNet urges close scrutiny of SLAs. “Since it has been the EBS and RDS services rather than EC2 itself that has failed (and all the failures have been restricted to Availability Zones within a single Region), the SLA has not been breached, legally speaking.”
  • Amazon’s Trouble Raises Cloud Computing Doubts: The New York Times examines the potential impact on cloud adoption. “Industry analysts said the troubles would prompt many companies to reconsider relying on remote computers beyond their control.”
  • The AWS Outage: The Cloud’s Shining Moment – At O’Reilly, George Reese makes the claim that the cloud is better than ever, but its users are not: “In short, if your systems failed in the Amazon cloud this week, it wasn’t Amazon’s fault. You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon’s cloud computing model.”
  • Magical Block Store: When Abstractions Fail Us – Joyent takes a closer look at EBS: “I certainly don’t claim to know how EBS works, but of course people go to bars and have beers and talk. It’s commonly believed that EBS is built on DRBD with a dose of S3-derived replication logic. … Maybe (Amazon) did what a dozen billion dollar companies before them tried to do and never pulled off. Or maybe EBS is indeed bandaids and chicken wire. I have no idea. Which is a problem, as a user of EBS.”
  • AWS Developer Forums: Life of our patients is at stake – Not all apps are appropriate for cloud computing.Case in point: An Amazon user who was apparently using EC2 to run a service monitoring cardiac patents.
  • Bye, Bye, My Clustered AMIs: Christofer Hoff memorializes the outage with updated lyrics for Don McLean’s classic “American Pie.”

About the Author

Rich Miller is the founder and editor-in-chief of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

2 Comments