Posted By Jason Verge On October 23, 2012 @ 4:16 pm In Amazon,Cloud Computing | 2 Comments
Will Monday’s downtime for some customers of Amazon Web Services have a lasting impact for AWS, the largest player in the cloud computing arena? The incident is unlikely to cause any kind of mass exodus, and doesn’t discredit cloud as a business model, as some pundits may claim. Some Amazon customers have been through five outages in 18 months, and continue to remain on the AWS cloud.
But the recurring outages may hamper Amazon’s’ enterprise ambitions, and have prompted a slew of its competitors to leap into action, attempting to court away the massive AWS customer base.
Amazon attributed Monday’s incident  to a “small number’ of storage volumes in a single availability zone located in its outage-plagued US-East-1 region. The outage affected Minecraft, Reddit, imgur, and (to some extent) Pinterest, to name a few of the more public ones. While the outage has largely been resolved, there are lingering issues. Volumes affected during this event continued to re-mirror on day two, leading to increased volume IO latency.
US-East-1 has faced continued problems. Some users assert that it is overcrowded. However, there is a way to use the service and not confine yourself to a single Availability Zone, providing a way to ensure proper failover. Netflix is one example
Netflix learned from a serious outage following a 2011 service outage. It changed its architecture to avoid using Amazon Elastic Block Storage (EBS) as its main data storage service. The company released a thorough explanation  that is a worthwhile read for those interested in reliability on Amazon’s platform.
However, many sites with massive amounts of traffic don’t make Netflix-style money. There are cost considerations, and the complex architecture used by Netflix isn’t automatically built into AWS. High traffic consumer sites are sometimes particularly vulnerable to the costs of proper failover, or at least are the ones that decide to roll the dice with a lower level of redunancy.
They’re also among the most publicly visible AWS customers. Yesterday’s downtime affected a lot of consumer web properties that receive massive amounts of traffic, but generally like to keep costs low. Unfortunately, outages at widely-used consumer web properties – like Reddit, Imgur and Minecraft- are what often get the most attention.
Adding to the problem, Amazon said some users attempting to shift workloads to unaffected zones may have been unable to do so. “Customers can launch replacement instances in the unaffected availability zones but may experience elevated launch latencies or receive ResourceLimitExceeded errors on their API calls, which are being issued to manage load on the system during recovery,” the dashbaord said. “Customers receiving this error can retry failed requests.”
The crux of the issue is this: Is this more of a warning of the dangers of remaining on the AWS’ cloud, or a warning that sites need to be architected better and not confined to a single availability zone?
Arguably, this outage is problematic for AWS’ efforts to win over the enterprise. The company has highlighted high-profile enterprise wins such as Nasdaq for its FinQloud in a bid to court the enterprise. It wants to reverse the belief that AWS isn’t suitable for mission critical systems and/or enterprise usage by highlighting these wins. These outages affect those efforts.
The cycle occurs once again: AWS goes down, the discussion about the importance of redundant availability zones picks up, and competitors pounce on a perceived opportunity to potentially poach customers. Some examples:
This strategy often works – particularly with enterprises who get spooked. It’s another reminder that traditional hosting service providers with IaaS offerings can provide levels of service and customer care that AWS simply cannot due to the volume of customers on its cloud, and the fact that this kind of hands-on, individual customer support isn’t economically feasible for Amazon to provide. However, this outage is one in a string of outages over the years, and AWS continues to grow and perform well. This won’t cause a mass exodus. However, it does provide an opportunity for other service providers to differentiate and highlight their own clouds.
Article printed from Data Center Knowledge: http://www.datacenterknowledge.com
URL to article: http://www.datacenterknowledge.com/archives/2012/10/23/aws-outage-rivals-seek-to-capitalize/
URLs in this post:
 BCP: http://www.flickr.com/photos/biggaypat/
 Flickr: http://www.flickr.com/photos/biggaypat/254511616/
 Monday’s incident: http://www.datacenterknowledge.com/archives/2012/10/22/amazon-cloud-outage-affecting-many-sites/
 thorough explanation: http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html
 plea to Reddit on its blog: http://joyent.com/blog/if-i-was-your-cloud-provider-i-d-never-let-you-down
 Jason Verge: http://www.datacenterknowledge.com/archives/author/jasonv/
Copyright © 2012 Data Center Knowledge. All rights reserved.