Rackspace CEO Napier on Dallas Outages
July 10th, 2009 By: Rich Miller
Rackspace has updated its blog with a message from President and CEO Lanham Napier on the recent outages at the company’s Dallas area data center, and posted an incident report (PDF) encompassing the June 29 and July 7 incidents. Rackspace has also posted a video in which Napier provides additional information about the recent problems at the DFW facility and how Rackspace plans to address them. This video runs about 5 minutes.
MitchPosted July 12th, 2009
I sure hope they treat their customers better than they do potential employees.
I mailed the UK division last year with my CV/Resume and they called my mobile on a Sunday evening, I could not answer as I was dealing with a customer at the time (customer first and formost always) so I rang back next day, the person who left the message was not available so I left a detailed message indicating when I would be at home and thus able to talk.
I’m still waiting for the callback from June last year.
Would I work for them now? No!
Would I host anything with them now? Never!
JeffPosted July 13th, 2009
Can anyone explain why the description of the July 7 outage includes “there was a failure in the bus between the UPS and PDU so we switched to generator power”. The video also explains that “all generator power is fed through the UPS”. This is self contradictory unless there is an unmentioned dedicated busway for the generator to feed power to the PDUs. While possible, this is uncommon, and it is not represented in the whiteboard drawing in the video. I suspect that Rackspace still isn’t being completely honest about the root causes and true extent of the failures.
Going Down??? Why!Posted July 13th, 2009
Six years I’ve run my facility…never a power or mechanical outage.
- Top of the line high quality trained Critical Environment staff/ vendors
– A serious uptime data center management
- Solid PM budgets
- A quality 24×7 above the bar industry standard maintenance program.
ErniePosted July 16th, 2009
The question of what happened was explained in detail, the question how do we fix it was cloudy at best. The fundamental cause of the outage began when the datacenter was built. This Datacenter needs to start with a power and design assessment. If RS was asked how they planned to expand their datacenter could he do so with the proper documentation?
Datacenters that were constructed five years ago were never meant to handle the amount of data that flows through them today. Most of those datacenters never went through rigorous acceptance testing during construction and should do so now. A UPS and Generator array constructed 60 months ago were not designed to run at 80% loads continuously but that is the way they are operated daily.
I received a call from a customer that wanted to purchase some additional cooling. The customer told me their datacenter had an 800amp breaker that keeps tripping. So the solution was to place a spot cooler in front of the circuit breaker to keep it cool until the replacement breaker was installed. When I inquired to the amp draw on the breaker I was informed that it was 550amps. I asked for the circuit breaker coordination study and the one-line diagram for the facility and was told none existed. My customer is willing to fork out $15,000 dollars for a circuit breaker because “They do not have any idea why it keeps tripping, so they are going to replace it!” NFPA 70E discusses the need for arc flash analysis and a circuit breaker coordination study.
Acceptance testing and an arc flash analysis with a circuit breaker coordination study could have prevented the RS outage. It’s never too late to go back and take a look at the design of a datacenters electrical infrastructure.
It is time for all datacenter to perform design audits to figure out where their datacenters are at and how much more can they expand. Customers want to know if they can add equipment and need to be assured that their systems can handle the new load. When is the last time a cooling audit was performed on this site? A complex cooling audit can generate thousands of dollars in energy savings each quarter if the findings are implemented.
Too much of the old telecom mentality has bled over to the datacenter. It’s not a matter of just adding more rectifiers and some copper bus to expand the new datacenter. It’s much more complicated than that. So pull out all of your PM reports and make sure you are up to date and pray nothing happens that a simple audit would have caught.