A lightning strike has caused power outages at the major cloud computing data hubs for Amazon and Microsoft in Dublin, Ireland. The incident has caused downtime for many sites using Amazon’s EC2 cloud computing platform, as well as users of Microsoft’s BPOS (Business Productivity Online Suite).
Amazon said that lightning struck a transformer near its data center, causing an explosion and fire that knocked out utility service and left it unable to start its generators, resulting in a total power outage. While many sites were restored, Amazon said some sites that rely on one of its storage services may take 24-48 hours to fully recover. The company is bringing additional hardware online to try and speed the recovery process, but is advising customers whose sites are still offline to re-launch them in a different zone of its infrastructure.
UPDATE: On Wednesday, Aug. 10 local utility ESB Networks now says that lightning was not the cause of the transformer failure that dropped utility power to both major data centers. The power company said the actual cause of the transformer failure remains under investigation.
Amazon said the event affected one of the EC2 Availability Zones in its Dublin data center, which is the company’s primary European hub for its cloud computing platform.
Generator Systems Disrupted
“Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators,” Amazon said in an update on its status dashboard. “The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronizes the backup generator plant, disabling some of them.”
“Power sources must be phase-synchronized before they can be brought online to load. Bringing these generators online required manual synchronization. We’ve now restored power to the Availability Zone and are bringing EC2 instances up.”
Amazon said the power outage began at 10:41 a.m. Pacific time, with instances beginning to recover about three hours later at 1:47 p.m. Pacific time. Recovery is taking longer for some user instances, including those using Amazon Elastic Block Storage (EBS), the company said.
UPDATE: As of 10:45 p.m. Eastern, Amazon reported that 60% of the impacted instances have recovered and are available. “Stopping and starting impaired instances will not help you recover your instance,” AWs said. “For those looking for what you can do to recover more quickly, we recommend re-launching your instance in another Availability Zone.”
UPDATE 2: Early Monday Amazon said that problems with EBS were slowing the recovery. “Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process,” Amazon said in a status update. “We are in the process of installing additional capacity in order to support this process both by adding available capacity currently onsite and by moving capacity from other availability zones to the affected zone. While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed.”
The Twitter feed for Microsoft online services reported that a European data center power issue had affected access to its BPOS services. Microsoft reported that services were stating to come back online as of 7:30 pm Eastern/4:30 pm Pacific. A Microsoft statement said that a “widespread power outage in Dublin caused connectivity issues for European BPOS customers. Throughout the incident, we updated our customers regularly on the issue via our normal communication channels.”
“We were informed the incident was the result of an lightning strike that caused an explosion on one of the substation’s transformers,” said a Microsoft spokeswoman. “The lightning strike created an electrical surge large enough that it affected a portion of our backup power systems. Our team is highly trained to respond to various types of issues, so they then began to manually transfer to generator power.”
Microsoft said that the Dublin data center’s utility power was restored at 03:10 AM PST Monday and the data center “has returned to normal operating conditions.”
Key European Cloud Computing Hub
Dublin has become a key cloud computing gateway to Europe and beyond for U.S. companies due to several factors, including the city’s location, connectivity, climate and ready supply of IT workers. Dublin’s temperature is ideal for data center cooling, allowing companies to use fresh air to cool servers instead of using huge, power-hungry chillers to refrigerate cooling water.
This allowed Microsoft to design and build one of the world’s most efficient data centers, a huge facility that hosts the company’s cloud services for Europe and operates entirely without chillers. At 550,000 square feet, it is also one of the world’s largest data centers.
Amazon opened a data center in Dublin in December of 2008 to house the European availablity zones for its EC2 cloud computing services. The company recently acquired a 240,000 square foot building in Dublin which will be converted into an expansion data center.
The company’s property moves reflect the rapid growth of its European cloud computing operation, which was chronicled by Netcraft in December.