Posted By Rich Miller On December 25, 2012 @ 1:53 pm In Amazon,Downtime | 5 Comments
Special thanks to our awesome members for being patient. We’re back to normal streaming levels. We hope everyone has a great holiday. — Netflix US (@netflix) December 25, 2012 The ELB service is important because it is widely used to manage reliability, allowing customers to shift capacity between different availability zones, an important strategy in preserving uptime when a single data center experiences problems. During a June 29 outage , Amazon said a bug in its Elastic Load Balancing system prevented customers from quickly shifting workloads to other availability zones. This had the affect of magnifying the impact of the outage, as customers that normally use more than one availability zone to improve their reliability (such as Netflix) were unable to shift capacity. In a July 2 incident report  from that event, Amazon outlined steps it would pursue to avoid a repeat of these issues: “As a result of these impacts and our learning from them, we are breaking ELB processing into multiple queues to improve overall throughput and to allow more rapid processing of time-sensitive actions such as traffic shifts. We are also going to immediately develop a backup DNS re-weighting that can very quickly shift all ELB traffic away from an impacted Availability Zone without contacting the control plane.” It will be interesting to see whether Amazon’s load balancing problems were related to any of the issues identified in July, and what new solutions are devised to address them. We’ll likely see information on that front soon, as the Amazon team has been scrupulous about publishing details incident reports.
Article printed from Data Center Knowledge: http://www.datacenterknowledge.com
URL to article: http://www.datacenterknowledge.com/archives/2012/12/25/major-christmas-outage-for-amazons-cloud/
URLs in this post:
 BCP: http://www.flickr.com/photos/biggaypat/
 Flickr: http://www.flickr.com/photos/biggaypat/254511616/
 December 25, 2012: https://twitter.com/netflix/status/283614473397342208
 June 29 outage: http://www.datacenterknowledge.com/archives/2012/07/03/multiple-generator-failures-caused-amazon-outage/
 July 2 incident report: http://aws.amazon.com/message/67457/
 Rich Miller: http://www.datacenterknowledge.com/archives/author/richm/
Copyright © 2011 Data Center Knowledge. All rights reserved.