Media Temple Issues 1-Year Outage Credit
May 7th, 2009 By: Rich Miller
Media Temple is issuing a one year service credit to its grid hosting customers whose sites have been knocked offline by a lengthy outage this week, the company said yesterday. The problems occurred on the same cluster of MT’s Grid-Service that was affected by a previous outage in early March.
Media Temple extended the service credit as it continued to migrate customers from a troubled storage section of its Grid-Service Cluster.02. Sites hosted on Cluster.02 went down Monday afternoon at about 4:15 p.m, with the Los Angeles hosting company citing file corruption issues in its storage system. The problems were similar to those that crashed Cluster.02 for nearly two days in early March.
In an update at 4 a.m. Pacific today (Thursday), Media Temple said all but 200 customer sites were back online.The problems are only affecting customers of its grid hosting service, and not those using MT for shared hosting or dedicated servers (such as TechCrunch). But Media Temple CEO Demian Sellfors said the company recognized that it needed to address the impact on Grid-Service customers who have suffered through two bouts of extended downtime.
“We’re offering a one-year service credit to any customer on Cluster.02 who has been meaningfully affected by the outage,” said Sellfors. “If you were burned by the Grid-Service platform, we want you to be compensated and feel good about it.”
Nearly 16,000 sites on Cluster.02 were affected by this week’s outage, with about 14,000 of those coming back online early Tuesday after about 10 hours of downtime. The remaining 2,000 sites are housed on Storage Section 01, which will require a complete file check by vendor BlueArc that will take up to 48 hours.
Media Temple opted to immediately begin restoring sites from the most recent disaster recovery backup, which would return the sites to service, in some cases without the most recent content. Any missing content will be restored as soon as the file system check is completed on Storage System 01. The sites were gradually restored throughout Tuesday and Wednesday. UPDATE: The decision to restore from DR backup proved to be a wise move. As of Thursday morning, BlueArc had not yet begun the file check, apparently due to issues with a required firmware upgrade.
Sellfors had no estimate for what the service credits would cost the company in dollar terms, but it clearly will not be trivial. Grid-Service accounts start at $20 a month, so a one-year credit would equate to $240 in hosting fees. It’s not clear how many accounts are connected with those 16,000 sites, or how many of those customers would fill out support tickets seeking a credit.
The offer seems to have made an impression on at least one high-profile critic. “I’ve got to say this feels like generous and just compensation,” wrote Brennan Novak, a web developer who set up the Media Temple Customers blog to air his unhappiness with the March downtime. “Whatever the reasons for the hardware failure, whatever the reasons for not being migrated sooner- this compensation is testament to customer care and loyalty I’ve yet to see with any large company in any industry. Perhaps there are others who have had dire consequences from the downtime to which even this is not enough, but in my case it is.”
Other customers felt differently, and vented on Twitter. “Still seems ragefully inadequate to me,” wrote one customer. “Fifty hours without web or email and no end in sight. 1 yr credit not going to make this right,” wrote another.
At the time of the March outage, Media Temple blamed the problems on its legacy “first generation” storage system from BlueArc and announced plans to transition its grid hosting service to a new “second generation” system. Two months later, many customers remain on the older storage system, and it will be at least another month before all the sites are moved.
“We are still in the process of moving onto the newer (storage) devices,” said Media Temple CTO Josh Barratt. “We are moving people off Generation One as fast as we can. The process has been expedited. There’s a very good chance we could be done before the end of June.”
The biggest difference between the two storage systems is the recovery process. “The failure we’re dealing with now would be resolved in 10 minutes on Generation Two,” said Sellfors.
Until the process is completed, a key priority will be to maintain the stability of the older system. “If we do have another failure like this (while customers are still on Generation One), we’ll have to recover in a similar way,” said Barratt.
B JamesPosted May 12th, 2009
Of course a big question no one is asking, is this. When MediaTemple decided to restore clients from “disaster recovery” backups, why were people seeing data from 1 to 3 months prior to the date of the issue? If you had issues with that exact storage segment in March, would it have not made sense to have even weekly off-storage system archives of sites?
While at $20 each I guess you can’t expect a ton of data redundancy, but I would hope that while many places seem to manage daily or even weekly off-storage system backups in the event of a storage system failure (remember boys and girls RAID and redundant storage systems are NOT backups), why were people looking at either monthly or quarterly aged backups? Seemingly glossed over, but sad none the less.
Seems they have managed to create the world’s largest “single server” hosting system. Frequent down-time, affects web, email and all services related to your account at the same time, and while its a grid, it sure has a lot of weak points…
WHocaresPosted May 12th, 2009
I too wouldnt be happy for a year of service. THe whole point of their grid service is that thsi isnt suppose to happen lol.
Their marketing terms are BS
In a further botched attempt to make things better, they transferred my site off cluster 2 to cluster 5 at 2pm… Middle of the day. Site and email both went offline. I logged on to a notice saying my account was listed as suspended. Turns out that was they way they decided to notate the account during transfer. Eventually the transfer was completed and things seem to be working as normal.
It’s not the first outage in a long line of challenges on cluster 2 with media temple, but this one appears to have been planned for 2pm! I am once again amazed by media temples incompetence.