About 3,000 servers at Montreal web host iWeb experienced an outage last night after a fire near the iWeb-CL data center prompted the company to shift the facility to generator power. All three generators started properly, but one of the transfer switches failed. Once UPS power was exhausted, a third of the data center wound up without power.
Power was restored in about an hour, but at least 450 dedicated servers failed to restart properly and needed manual attention, according to the account of the incident on the iWeb blog. As of Thursday afternoon, the last of the affected servers were being brought back online.
The iWeb event was the fourth significant data center power outage this year in which an automatic transfer switch (ATS) failure was cited. When operating correctly, an ATS switches a facility's electric power source from the utility grid to backup power, usually supplied by a diesel backup generator. Here's a recap of this year's incidents:
- On Jan. 28 a NaviSite data center in Santa Clara, Calif. lost power when a transfer switch failed during a utility outage due to a thunderstorm. The company tied the problems to a surge suppression system that failed to protect relay fuses within the ATS.
- On March 16 dedicated hosting provider Codero suffered a major power outage in its Phoenix data center. The backup generators started properly, but an automatic transfer switch (ATS) failed to switch the power to generator power.
- On May 12 some customers in an Amazon Web Services data center in Virginia lost service after a transfer switch failed to properly manage the shift from utility power to the facility’s generators. In that instance, the utility outage was triggered when a vehicle crashed into a utility pole near the data center.
"We suspect the transfer switch control logic has been disrupted by the kind of power outage (the fact that it was related to a fire nearby)," iWeb said on its blog, adding that it was still investigating. "Generator tests were completed successfully yesterday (Tuesday 2/11/2010) without load and successfully with full load last week (26/10/2010) with no indications of any potential problem. We run generator tests each week in order to prevent unplanned issues."