Transfer Switch Cited in NaviSite Outage
NaviSite says it will overhaul the surge suppression system at its San Jose, Calif. data center in the wake of last week’s power outage at the facility. In an incident report to customers, NaviSite (NAVI) said the facility’s surge suppression system didn’t adequately protect relay fuses within an automated transfer switch (ATS) from a power surge at the onset of a utility outage on Jan. 19.
The damage to the relay fuses left the ATS unable to start the facility’s diesel backup generators, as they normally would in a utility outage. With the generators offline, the data center switched over to battery power from the uninterruptible power supply (UPS).
“During this time, the UPS batteries were drained, and once the batteries were drained, power was lost to the data center floor at approximately 4:56 am PST,” NaviSite reported. “Power was restored to the data center by manually starting the generators and transferring load to the generators. Power was restored to the data center at 5:35 am PST.”
In the wake of the incident, NaviSite is taking steps to add resiliency to its power infrastructure. “We have an electrician coming in to re-architect our surge suppression environment,” said Allen Allison, Vice President of Managed Services for NaviSite. “We are looking at the surge suppression and switch infrastructure for all of our facilities.
Allison said the key components of the power infrastructure at the San Jose facility are tested regularly. “We perform complete load testing on a monthly basis,” he said. “We also perform a test for the ability of the transfer switch to throw on the generators.” The power surge, however, is hard to simulate. “There’s no way to inject such a heavy surge (into a test),” Allison said.
During the outage, NaviSite kept customers informed via e-mail, a company blog and Twitter. Chief Marketing officer Claudine Bianchi said this type of multi-facted communications effort is critical during outages.
“We found that being transparent was very beneficial,” said Bianchi. “Usually in circumstances like this, lack of information is the biggest issue for customers.”
While Twitter is an increasingly important venue for customer updates during hosting outages, Bianchi said e-mail remained the primary communications tool for NaviSite’s many enterprise customers, many of whom receive notifications via their Blackberry devices.
[...] } This morning we have more details on the Navisite data center outage in San Jose. Root cause was an engineering failure, allowing the automatic transfer switch to fail to start the [...]
[...] } After commenting on the Navisite San Jose data center outage (original here), I had a number of different viewpoints presented to me on the data center power reliability [...]
ATSguyPosted January 29th, 2010
I’m confused by this release. If the NaviSite data center mentioned is located in US, then fuse location “within the ATS” violates UL 1778 standard. An ATS listed to UL 1778 standard shall not have any overcurrent tripping devices.
We need clarification on this item from NaviSite.
ATSguy: It could be that something got lost in translation in how I phrased this. The incident report states that “multiple relays within the transfer panel” were malfunctioning.
[...] eBay ProStores service to operate e-commerce web sites. NaviSite later took steps to overhaul the surge suppression system at its San Jose data center, which didn’t adequately protect relay fuses within an automated [...]