Generator Failure Slows The Planet's Recovery
June 3rd, 2008 By: Rich Miller
A backup generator at The Planet failed last night as the company sought to restore power to its Houston data center, which was damaged by an explosion and fire Saturday night. The generator problems left about half of the 3,000 servers on the first floor of the Houston “H1″ data center offline. The company is seeking a replacement generator while it continues to work to repair the existing unit. UPDATE: The company has now offered to physically migrate servers to a second Houston data center it operates.
“Around 2:20 AM CDT, the backup generator being used to power H1 Phase I (the first floor) experienced an electrical issue resulting in service loss for Phase I,” The Planet’s Kevin Hazard wrote in a customer update early Tuesday. “The staff successfully tested the 2 megawatt generator without load, so they began powering up the CRAC (computer room air conditioner) units and PDUs to restore service to Phase I. While working through this power restoration, the generator’s breakers were tripped by their internal electronics. The generator is rated to handle more than the load required to power the phase, and the generator itself is fully functional, but the breaker system must be replaced to guarantee stable power distribution.
“We have attempted to locate a replacement generator and are evaluating the time necessary to repair the breakers on the current generator so we can restore power as quickly as possible,” Hazard said, adding that the company doesn’t yet have an estimated recovery time for the repairs.
UPDATE: At noon central The Planet provided this update: “Fixing the faulty breaker on the generator powering H1 Phase 1 was not successful. We have located a second generator that is currently being delivered to the facility. It is expected to arrive this afternoon and we will provide additional information regarding the new generator at that time.”
The generator issues presented a setback after a day of major progress in restoring service to the 7,500 customers with servers hosted in the damaged H1 data center.
On Monday morning power was restored to the second floor (phase 2) of the Houston data center, allowing The Planet to bring 6,000 servers back online. At about 6 pm, CEO Doug Erwin said that power was being restored to the first floor, which suffered more extensive damage in the explosion, which blew three walls of the electrical equipment room several feet from their original position, and destroyed the underground cabling that powers the first floor.
The electrical room will need to be almost completely rebuilt, a process that could take several weeks. Erwin and The Planet’s operations staff developed a workaround to get the first-floor customers back online.
“We decided not to wait for equipment for the electrical room completely, opting instead for a temporary solution to get power to the 3,000 servers,” said Erwin. “That solution involves using generator power for the next 10 to 12 days until all the new equipment arrives to rebuild the electrical room for Phase 1.”
The generator failure has now interrupted the recovery timeline. The generator that failed was a temporary unit obtained to support the plan to support Phase I servers. The second floor is being powered by the permanent generator that was already on site. Last June The Planet spent $3 million to purchase six new Cummins 2-megawatt generators to support its data centers.
Generators are not simple to obtain on short notice. In recent years there has been a backlog in generator orders, with lead times ranging from nine months to more than a year for the 2-megawatt units favored by top-tier data centers. Smaller generators are more readily available, but The Planet would presumably have based its recovery plan upon the capacity of the existing 2-megawatt unit.