Another Lengthy Outage for Media Temple
May 5th, 2009 By: Rich Miller
Some grid hosting customers of Media Temple have been offline for more than a day due to a recurrence of problems with its storage systems that caused a 38-hour outage earlier this year. Sites hosted on Cluster.02 of Media Temple’s Grid-Service platform went down Monday afternoon at about 4:15 p.m, which the Los Angeles hosting company attributed to file corruption issues in a storage system.
The problems are only affecting customers of its grid hosting service, and not those using MT for shared hosting. On Tuesday afternoon, Media Temple said 1,800 customers remained offline, and the company apparently is having differences with storage vendor about the best restoration options and the prospect of additional downtime.
Media Temple said the problems were similar to those that crashed Cluster.02 for nearly two days in early March. At the time, Media Temple blamed the problems on its legacy “first generation” storage system from Bluearc and announced plans to transition its grid hosting service to a new “second generation” system. Two months later, the migration is still not complete.
“This System Incident is taking place on a cluster using our 1st-generation storage architecture,” Media Temple said on its status page. “While we have made significant progress transitioning this cluster to gen 2 storage … there are still components that rely heavily on the gen 1 technology.”
Media Temple initially promised a new internally-developed storage system for Grid-Service following an outage in December 2007.
Media Temple said that some customer sites were returned to service early Tuesday morning, but others remained offline pending a lengthy file check of a storage subsystem. As of 11 a.m. Pacific time Wednesday, Media Temple said the process was nearly complete, but still had no firm ETA for when service would be restored.
UPDATE: As of 4 p.m. Pacific on Tuesday, Media Temple says 1,800 sites remain offline. “Our storage vendor has recommended a full file system check on Segment.01, amounting to an additional estimated 48 hours of downtime for those 1800 users,” MT writes in its latest update. “This is not acceptable to us. Our engineers are already in the process of restoring all Segment.01 sites from our backup servers. … he vendor-recommended repair action will be completed in tandem to prevent data loss. When it is complete we will then merge the repaired data to bring all data current.”
AlexPosted May 6th, 2009
The funny thing is their billing system still seems to work.
a company with good marketing, wish they’d invest the same amount of efforts in product development
JohnPosted May 6th, 2009
An outage a quarter can’t be good for business. One would venture to guess that any savings they may have received from going this route has long been burned in lost customers and bad press.
How about taking all that development $$ and investing in a Brocade / EMC solution.
MarkPosted May 7th, 2009
I have a grid server account with MT on this cluster and my sites have been down since the initial crash a few days ago. They finally got my sites back online today (May 7, 2009), but all the files are rolled back from 2+ months ago… .. .
My site just went down tonight (Saturday night), and it’s been out for several hours now. Anyone know what’s going on?