UPS Failure Triggered Friendster Outage
November 17th, 2008 By: Rich Miller
A “catastrophic” UPS failure caused a power outage Thursday at a Santa Clara data center operated by Quality Technology Services, triggering days of performance problems for the social network Friendster. Quality Tech said the outage occurred during planned maintenance when the facility was switched from utility power to backup diesel generators.
“Regrettably, the maintenance did not go as planned and we suffered a catastrophic UPS failure at 8:22 am Pacific Standard Time,” said Mark Waddington, President of Quality Technology Services, in an incident report for customers. “The UPS failed to stand in and smoothly transfer power from the utility to the temporary generators due to a voltage regulator problem with the temporary generators. The failure resulted in the triggering of the FM200 (fire suppression) system in the enclosed battery room and the subsequent EPO as part of our life safety system.”
FM200 is a popular fire suppression system that uses a chemical “clean agent” rather than water. The EPO (Emergency Power Off) button instantly cuts power in the data center when a situation presents a risk to worker safety or equipment.
The Santa Clara facility was back on generator power within two hours, but Friendster remained offline for more than 23 hours over three days. While it has been eclipsed in the U.S. by MySpace and Facebook, Friendster has seen strong growth in international markets (particularly the Philippines) and says it has 85 million users.
When Friendster came back online, many of its users found large chunks of their friend lists missing, triggering rumors that the site had been hacked. Friendster addressed the issue Saturday in its customer forum.
“We’re aware of the problem that some users are having with missing friends,” a Friendster rep posted. “We experienced a major power outage the other day, that we’re still recovering from. We are actively working on resolving the problem with missing friends. Rest assured that no friends have actually been lost – even though it may appear so on the website! The problem should be fixed within the next 24 hours.”
The generator test was part of Quality Tech’s deployment of 22,000 square feet of new data center space at the Santa Clara data center. On Friday the company replaced strings of batteries while engineers thoroughly checked the UPS systems. The facility switched back to utility power Sunday night at 7:55 pm Pacific time.
Waddington said Quality Tech had spent five months preparing for the maintenance, and promised additional information about the incident. “In the coming hours, we will complete an exhaustive study of what failed and why and issue a formal after action report with detailed root cause analysis,” Waddington wrote.
[...] that brought down Friendster didn’t just leave them powerless. It also left their data center soaked in fire suppressant. “Regrettably, the maintenance did not go as planned and we suffered a [...]
Notes to Friendster: make data cache less volatile, find a better data center, and keep customers informed during a crisis through a corporate blog in a separate data center.
[...] For those interested, Rich Miller at Data Center Knowledge has posted indepth information about the data center outage that knocked out Friendster. [...]
This was posted by Friendster staff as a comment on another post on our site, but I believe it is relevant to this post:
Dear Friendster Community and those reading this blog,
As you may have seen, Friendster.com has undergone maintenance at various times over the past few days. Friendster’s unscheduled downtime was due to a power outage at our outsourced data center in Santa Clara, California where Friendster’s servers are co-located along side approximately 50 other companies. As a result, Friendster, as well as a number of other online companies, experienced unscheduled and unavoidable downtime. At this time, Friendster is back online and our team is working quickly to restore everything back to normal.
Additionally, you may have experienced some inconsistencies with your friend count on Friendster. All of your friend connections and data stored on Friendster are in tact and will be corrected shortly. It’s simply taking some time for us to restore each user account as Friendster has over 85 million user accounts globally.
Lastly, if you’ve received any on site messages or SMS text alerts regarding our downtime or your friend count, please disregard them as they were initiated by members of our community and do not contain accurate information about what happened.
Thank you for your patience and we apologize for any inconvenience this may have caused.
-The Friendster Team
For more information, please visit our blog and our Help Center.
[...] I checked the net about the validity of the news and found that it was, indeed, true. You can read the entire stuff right here and here. [...]
[...] UPS Failure Triggered Friendster OutageA “catastrophic” UPS failure caused a power outage Thursday at a Santa Clara data center operated by Quality Technology Services, triggering lengthy problems for the social network Friendster. Read more at our web site. [...]