Another Tough Day at Rackspace
July 8th, 2009 By: Rich Miller
Rackspace Hosting’s Dallas data center experienced another power outage Tuesday. The downtime was shorter and appears to have affected fewer customers than last week’s outage at the same facility, but gained media coverage in TechCrunch and the investing blogs Tech Trader Daily and Silicon Alley Insider.
Tuesday’s incident was caused by a failed bus duct in the data center’s power infrastructure, which affected customers on one UPS cluster. Rackspace was able to bypass the UPS and restore power using its backup generators. The power problems also caused some network problems. As of 8 p.m. Central time, the affected cluster was still being supported by generator power.
“We are still in the process of determining why the bus duct failed and why customers experienced downtime as a result of this issue,” Rackspace reported in an update on its blog. “Customers supported by UPS cluster A are currently being powered by generators, which are running reliably and predictably. The bus duct replacement is underway and, when complete, will allow us to switch back to utility power.”
Rackspace expects to issue customer service credits from the June 29 data center outage of between $2.5 million and $3.5 million, the company said in an SEC filing. An unusual series of equipment failures contributed to the outage, in which several parts of the Dallas facility lost power.
The difference in reliability between RackSpace and another decent provider ( with much less expensive monthly cost) is not as much as one perceive.
This further proves that 100% (or 99.999%) uptime is impossible, no matter what the provider says, advertises or promises you.
B JamesPosted July 8th, 2009
As it has been said in many places SLAs are designed to protect the provider, not the client. Its a limit of responsibility in the event issues do happen, usually capped at a max of the current monthly fees paid for your specific service or for the portion of your service that was directly affected. So if you have 3 separate dedicated servers and only 1 of the 3 is affected your SLA coverage for “uptime” would only cover that servers portion of your monthly fees… ie: protection for the provider, not you as a client. SLAs on shared and VPS hosting are even funnier as you are going to get back a max of $10-50 for your “month” of credit even if your site is down for days…
While I agree that it is extremely difficult to achieve 100% uptime or even 99.999%, providers who offer SLAs for anything less are blatently stating they do not trust their infrastructure. I’ve seen SLAs stating 99.95% uptime, and to me it screams downtime is forthcoming. At least if the provider offers five nines or better, it shows a level of confidence, regardless of whether or not they achieve that.
A AllenPosted July 9th, 2009
While I agree that 100% of “five nines” is tough to achieve I don’t think that stating four nines 99.99 or 99.95 says the provider doesn’t trust their infrastructure. It is exceedingly difficult to achieve 99.999% uptime in anything but telco and with telco that is only achievable because so little changes on a day to day basis. The difference with other systems is that we ASK a lot of change of the systems and each of these changes has the potential to introduce instability. I for one prefer to have a realistic view of uptime rather than to plan for 99.999 or 100% and not have the backup manual processes in place.
[...] Another Tough Day at Rackspace (tags: datacentre availability) [...]
A nice persuasive CEO that comes across as believable is a great asset. After multiple outages however you get to a point where personal assurances MIGHT not be enough. Why not get certified by The Uptime Institute rather than asking customers to continue taking your word for it?
Encourage all commercial data centers to publish their certification results, then and only then, can consumers decide what their applications require.
[...] providers, because it plans to differentiate itself on customer service and support. However, some recent service outages may be interfering with the company’s image-building [...]