Amazon's utility computing service, EC2, suffered an outage Saturday morning that wiped out some customer application data. The incident is sparking some lively debate between EC2 users, with some users angered at the loss of their data and others noting that EC2 remains a "beta" service. The outage has highlighted the need for current backups for EC2 developers, who are sharing tools and techniques to protect their data.
EC2 (Elastic Compute Cloud) is a virtual grid that runs atop the Xen virtualization hypervisor. Developers can create applications in virtual machines, store them in Amazon's S3 storage system and run instances of the app on EC2 on a pay-as-you-go basis. This approach is highly scalable and flexible. But more than a year after its launch, EC2 remains in beta, and Amazon (AMZN) offers no service level agreement (SLA) or uptime guarantees. There's always the chance that the service will stop running and eat one of your instances, with no backup from Amazon.
That's what happened Saturday morning. "A software deployment caused our management software to erroneously terminate a small number of user's instances," the EC2 team said on its user forum. "When our monitoring detected this issue, the EC2 management software and APIs were disabled to prevent further terminations. Once we corrected the problem, we restored the management software. We will contact users that lost instances directly by email."
Some customer instances were terminated and unrecoverable. Many developers had backups and were able to restart their application. But others had no recent backups and were disappointed that Amazon couldn't assist them.
"We lost almost all our data due to the erroneous instance termination from Amazon, and we are in a very bad shape as a company," wrote one unhappy EC2 user, who wanted compensation from Amazon. Other developers were unsympathetic, saying EC2 beta users accept the risk of data loss, and should have their own backup strategy.
Other developers were ready to help. Reuven Cohen of Enomaly posted a review of backup options for EC2 users, noting that "the best bet is to save your information in more than one place and plan for the worst."
Paul Bissett of WeoGeo, a mapping service that runs on S3 and EC2, also offered backup tools, and said the incident illustrated the growing pains of new services like EC2.
"We believe in the future of scalable utility computing," Bissett wrote. "Dealing with events such as these is just a part of the issues with these types of systems that we'll all have to overcome to make this future work. Our goal is that we can share what we are creating for WeoGeo in a way that helps other overcome such problems. Please be aware of the limitation of utility computing, as well as the promise. Planning for these outages will be a requirement for safely outsourcing your metal resources."