Power Outage Affects Amazon Customers
June 15th, 2012 By: Rich Miller
A power outage at an Amazon Web Services data center in northern Virginia last night knocked some customers offline. Among the sites affected were Heroku, Pinterest, Quora and HootSuite, along with a host of smaller sites. Amazon confirmed the power outage on its Service Health Dashboard, but did not offer details on the root cause of the power outage.
The outage affected only one availability zone, the US-East-1 Region. The downtime led to the usual Twitter trash-talking about how major sites should spread their infrastructure across multiple Amazon availability zones, rather than relying on a single zone. Heroku indicated that its recovery efforts included shifting workloads to other availability zones.
But the outage was the third significant downtime in the last 14 months for the US-East-1 region, which is Amazon’s oldest availability zone and resides in a data center in Ashburn, Virginia. The US-East-1 region had a major outage in April 2011 and another less serious incident in March. Amazon’s U.S East region also was hit by a series of four outages in a single week in 2010.
While Amazon has multiple avalability zones, IP address research by Huan Li suggests that the majority of Amazon Web Services customers are concentrated in the US East region. Li estimates that Amazon has 5,030 racks in northern Virginia, or about 70 percent of the estimated total of 7,100 racks for AWS. By contrast, Li estimates that the newer Amazon US West (Oregon) region has just 41 racks, which are reportedly deployed in containers.
VJPosted June 15th, 2012
System reachability check failed at 2012-06-15 09:51 GMT+0530 (10 hours, 59 minutes ago)
AMAZON servers DOWN, Amazon aws status claims service fully restored. What a shame !
Is this an Amazon-owned facility or colo?
amazon owned of course – amazon doesn’t co-lo often though I know of one place they do co-lo at near Seattle.
just goes to show how poor the service is, and of course given their SLA the power outage doesn’t count as an outage since it only impacted one data center. It took them over an hour to admit to the power outage on their site as well.
Any real service provider would get reamed, and be providing credits, and real time updates to their customers. Not amazon though they get a free ride, instead supporters ream the customers for not building their applications right. Funny how that works.
(yes I’ve been using Ec2 for the better part of 2 years, and yes it has been the most frustrating experience in my career)
Mortimer AstonPosted June 15th, 2012
Nate – Amazon is one of Digital Realty’s biggest customers, leasing more than 400,000 SF from them in 6 different facilities (per DLR’s 2011 10K filing)
Five 9′s of availability still means an average of 5 mins of downtime a year, and that’s apparently all it takes to make negative headlines – nobody reports on the other 525,595 minutes a year….
Public cloud outages remind us of the limitations of public cloud. Public clouds are not always secure, elastic or cheap over the long term, and these very public ‘glitches’ underscore the fact that private cloud is best – in terms of cost, security, scalability and innovation – every time. We just wrote a blog about this too if anyone is interested you can check it out at http://www.pistoncloud.com/2012/06/aws-outage-public-vs-private-cloud/
Bee KayPosted June 15th, 2012
5 minutes of downtime, on their end, perhaps.
How long does your 4TB database take to rebuild into your memcache? More than 5 minutes.
If anything, they’re downplaying the outage, by not defining any of the details, such as how long without power, what percentage of the facility was down, how many companies have not recovered yet…
Forget about 5 9′s, ZeroNines is the now and the future.
While a power outage in a data center can happen and the data center owners in the United States invest heavily in industry leading commissioning and testing services, a challenge still may exist in the operation and management (O&M) years post commissioning. Clients are becoming used to improving uptime and higher reliablity achievements. The outages become less tolerable and the news of outages travels globally. Re-commissioning of existing facilities for training and O & M purposes shall become a future necessity. The benefits are improved response time, improved reliability, higher uptime and client happiness. Re-commissioning also creates an opportunity to add in and verify some LEED improvements and validate new energy savings initiatives to improve PUE.
John WallerichPosted June 16th, 2012
And the Cloud “experiment” continues… I wonder how many years it will before it’s ready for prime time, and when it is, what technology will be in the wings to replace it. This is not the end state.
[...] It is not rught away transparent because the focus services for these sites were not configured for the failover mode when the interpretation core went down, quite given the same interpretation core had the energy outage only fifteen days progressing upon Jun fourteen – the power outage which influenced most of the same sites. [...]