More on Carbonite's Data Loss
March 25th, 2009 By: Rich Miller
Carbonite CEO Dave Friend e-mailed us with additional information about the company’s recent lawsuit against a vendor, in which the company disclosed that it had lost data belonging to 7,500 customers. Friend didn’t deny that the lawsuit states that Carbonite “lost the backups of over 7,500 customers, but says number of customers who actually lost data – rather than having a snapshot of their data disappear – was much smaller. In the interest of thoroughness, here is Friend’s account of the incident:
The data loss event discussed in the lawsuit happened over a year ago. We do not say this to minimize the matter, but it’s important for your readers to know that we stopped buying the servers that caused the problem a long time ago. This is
not a current problem.
The total number of Carbonite customers who were unable to retrieve their data was 54, not 7,500. Here is what happened: The Promise servers that we were purchasing in 2006 and 2007 use RAID technology to spread data redundantly across 15 disk drives so that if any one disk drive fails, you don’t lose any data. The RAID software that makes all this work is embedded as “firmware” in the storage servers. In this case, we believe that the firmware on the servers had bugs that caused the servers to crash. Carbonite automatically restarted all 7,500 backups and more than 99% of these were
completely restored without incident.
Statistically, about 2 out of every 1,000 consumer hard drives will crash every week, so 54 of these customers had their PCs crash before their re-started backups were complete. Since they weren’t completely backed up when their PCs crashed, these customers were unable to restore all of their files from Carbonite. Most of the 54 got some or most of their data back. We took full responsibility for what happened and I did my best to call each of these customers personally to apologize.
As a result of our problems with the Promise servers, a couple of years ago we switched to a popular Dell server that uses RAID6 – an improved RAID that allows for the loss of 3 of the 15 drives simultaneously before you lose any data. This configuration is in theory 36 million times more reliable than a single disk drive – the chances of 3 out of 15 drives failing at the same time are almost nil.
So far, Promise has refused to accept responsibility for their equipment’s failures, so now we are suing them to get our money back. The Dell RAID servers have been flawless and we’re extremely happy with them.
An executive at Promise told the Boston Globe that Carbonite’s allegations were not accurate. “We stand by our product,” said Chi Chen Wu, senior vice president of Promise. “We looked into the claims and found there was no merit to the allegations.” Wu said the company was continuing to investigate the matter.
It seems to me that Carbonite wants to have it both ways. In its lawsuit, it invokes the loss of 7,500 customer backups and says the problem caused it “substantial damage.” When the event makes headlines, Carbonite insists that few customers suffered data loss. Which brings me back to my original point: that Carbonite’s disclosure was problematic for a firm in its line of business.
KyPosted March 26th, 2009
RAID firmware failure is certainly not unheard of. I once had a top-of-the-line HP StorageWorks controller lose its brains and lost over 5TB of data….that was nearly 100% recovered (over 24 hours) only via TAPE BACKUP. My advice to those who CANNOT lose data….backup your data via multiple technologies in multiple locations, and test your RESTORE process with some frequency. And always remember that nothing is 100%…but you can limit the risk with diversity.
MarcPosted March 30th, 2009
Nor is a dual drive failure on a RAID set unheard of. It does not have to be a simultaneous failure, in my case the second drive failed while the set was rebuilding after the first failure. It is also important to note that the probability of a multiple drive failure increases as the number of drives in the set increases, so that move to RAID 6 was a good idea. But another near- or off-line backup should be used.
In the case of our failure, the set was recovered only at large expense. I imagine that recovering a 15 drive set (if not corrupted by a twitchy backplane) would range into $50K-80K. Expensive, but I wonder if Carbonite thought to pursue that route to save face?
vijayPosted March 31st, 2009
Carbonite customers’ data loss is not Promise’s fault. For some more context on this case, see Promise’s response in a letter sent to customers this week at http://www.promise.com/support/Announcements.asp.
First, promise doesn’t make servers. They make storage controllers, so I’m confused from the start. Second, every piece of equipment in your infrastructure can crash — especially storage. That’s why you have backups.
So this company that makes their money off of backups didn’t have any backups? They were storing all of their customer’s backups on arguably the least expensive RAID 5 solution on the market and they didn’t make a second copy anywhere?
I can see them trying to blame them for down time or something like that, but to blame them for the actual data loss? I don’t see that.