Averting Disaster With the EPO Button
There are lots of stories about the Emergency Power Off (EPO) button in data centers, and few of them have happy endings. The EPO, also often known as the “big red button,” shuts off power for the entire data center in the event of an emergency. In non-emergency situations, it can be an accident waiting to happen.
On April 15 a disgruntled technician hit the EPO button at the data center that controls the electrical grid for the state of California, with the FBI calling the incident an act of deliberate sabotage. Officials said that the outage could have disrupted the power grid for the Western U.S. if it occurred during normal business hours instead of late Sunday night. While sabotage is difficult to predict and prevent, the EPO button provided a mechanism for doing maximum damage in an instant.
“A large number of EPO activations are acts of deliberate sabotage,” said Richard Sawyer, a principal at EYP Mission-Critical Facilities, during a presentation at AFCOM’s Data Center World in Las Vegas in March. The talk was titled “EPO: A Data Center Heart Attack Waiting to Happen,” and Sawyer shared some horror stories about EPO mishaps that have knocked data centers offline. But he also emphasized that effective data center management can minimize the risk of EPO related outages.
“There isn’t much good about EPO today,” said Sawyer, citing the many instances in which the buttons have figured in outages. “The EPO represents a single point of failure. We are getting more dependent as a culture on data centers. People can get killed and lives ruined by data center failures today.”
Sawyer shared data compiled by UPS vendors help quantify the extent of data center failures connected with the emergency power off button. One vendor reported 20 EPO related incidents between January 2002 and June 2003, representing 13% of all UPS failures during that time. Another UPS vendor’s study found that 26% of all human error failures were caused by the EPO, often in scenarios involving vendors, delivery persons or cleaning crews.
Why are accidents involving the emergency power off button so common? In many cases it’s because data center operators have made it too accessible and too easy. Sawyer posted one slide of an EPO button covered by a Post-it note saying “do not touch.”
One incident involved a major financial institution with $3 trillion under management. A delivery person wanted to open a door. “There were two buttons there,” said Sawyer. “The gray button for the door, and a red button for the EPO.” You can guess which one was pushed, followed by hours of expensive downtime.
The emergency power off button allows firemen to rapidly disconnect power controlled by the EPO in the event of a fire, and quickly shut down the HVAC systems serving a data center.
The history of the emergency power off switch dates back to 1959, when a fire in the Air Force’s statistical division in the Pentagon caused $6.9 million in property damage and destroyed three IBM mainframe computers. “Nothing gets the government’s attention like something that happens to government,” said Sawyer. The National Fire Protection Agency (NFPA) was tasked to develop rules to address fire risks in IT environments.
Understanding the requirements is essential to minimizing the risk of mishaps with the EPO button. While emergency power off functions are included in the NFPA standard, local fire codes don’t always mirror the standard. “You are only legally obligated to follow code,” said Sawyer. “A lot of inspectors don’t understand the difference between the code and the standard. You can educate them. You don’t confront them, but you work with them.”
Sawyer said there are some instances in which a data center isn’t required to have an EPO button. This is true for facilities that don’t use NFPA 75 as their standard, and don’t use non-rated plenum cable beneath the raised floor, Sawyer said.
The standard calls for a merchanism to disconnect power to all electronic equipment, and disconnect power to all dedicated HVAC systems. These functions don’t have to be governed by a single mechanism, Sawyer noted, and leave the option of separate buttons to terminate the equipment power and HVAC systems. The standard says shut-off systems must be “grouped, identified and readily accessible” and located at the principal exit doors.
“Make it accessible, but don’t make it easy,” said Sawyer, who recommends using three EPO buttons: one for Power Distribution Units (PDUs), another for the HVAC units, and a third to control the UPS feed to the data center. If the UPS units are in an equipment room that is separate from the data center, they do not need to be termined by the EPO button.
“Cover (the EPO) and make activation a positive, intentional multi-step action,” said Sawyer, who advocates monitoring the area with video cameras to deter sabotage. “Make sure signs are posted in the language spoken by the cleaning crew. Put an alarm on the cover. You need to make them stop and ask themselves: do I really like my job? Is this a good career move?”
Johnny MnemonicPosted May 7th, 2007
none of the above advice addressed the issue encountered with the lead in article: deliberate sabotage. Can such a thing be stopped, while allowing safety personnel access in time of emergency?
Johnny, the only way to make sabotage via EPO impossible is to eliminate the EPO.
I’d wager a sushi lunch that for every one justified use of an EPO, that there are over 19 incorrect/accidental/sabotage EPO hits. For all the attention we datacenter designer/operators give to absurdly remote possibilities such as “terrorist attack”, or “major earthquake” the reality is that our number one risk is just plain old, completely un-sexy human error. Go figure.