A “catastrophic” UPS failure caused a power outage Thursday at a Santa Clara data center operated by Quality Technology Services, triggering days of performance problems for the social network Friendster. Quality Tech said the outage occurred during planned maintenance when the facility was switched from utility power to backup diesel generators.
“Regrettably, the maintenance did not go as planned and we suffered a catastrophic UPS failure at 8:22 am Pacific Standard Time,” said Mark Waddington, President of Quality Technology Services, in an incident report for customers. “The UPS failed to stand in and smoothly transfer power from the utility to the temporary generators due to a voltage regulator problem with the temporary generators. The failure resulted in the triggering of the FM200 (fire suppression) system in the enclosed battery room and the subsequent EPO as part of our life safety system.”
FM200 is a popular fire suppression system that uses a chemical “clean agent” rather than water. The EPO (Emergency Power Off) button instantly cuts power in the data center when a situation presents a risk to worker safety or equipment.
The Santa Clara facility was back on generator power within two hours, but Friendster remained offline for more than 23 hours over three days. While it has been eclipsed in the U.S. by MySpace and Facebook, Friendster has seen strong growth in international markets (particularly the Philippines) and says it has 85 million users.
When Friendster came back online, many of its users found large chunks of their friend lists missing, triggering rumors that the site had been hacked. Friendster addressed the issue Saturday in its customer forum.
“We’re aware of the problem that some users are having with missing friends,” a Friendster rep posted. “We experienced a major power outage the other day, that we’re still recovering from. We are actively working on resolving the problem with missing friends. Rest assured that no friends have actually been lost – even though it may appear so on the website! The problem should be fixed within the next 24 hours.”
The generator test was part of Quality Tech’s deployment of 22,000 square feet of new data center space at the Santa Clara data center. On Friday the company replaced strings of batteries while engineers thoroughly checked the UPS systems. The facility switched back to utility power Sunday night at 7:55 pm Pacific time.
Waddington said Quality Tech had spent five months preparing for the maintenance, and promised additional information about the incident. “In the coming hours, we will complete an exhaustive study of what failed and why and issue a formal after action report with detailed root cause analysis,” Waddington wrote.