Errant Code Change Crashes 10 Million Blogs
The blog hosting service WordPress.com suffered a major outage last night, affecting as many as 10.2 million hosted blogs, including high-profile news sites like TechCrunch and GigaOm. “The cause of the outage was a very unfortunate code change that overwrote some key options in the options table for a number of blogs,” WordPress founder Matt Mullenweg reported in the comments on TechCrunch. Most sites appear to have been offline for about an hour, but it was about six hours before the WordPress.com team posted that operations were “back to full speed.”
The downtime was the second major outage of the year for WordPress.com, which also had availability problems for two hours in February. That outage was attributed to an unscheduled change to a core router by a data center provider.
The incident had Mullenweg reflecting on the experience of hosting TechCrunch, which posts articles whenever one of its hosting providers suffers an outage. “Mike and team at TC: you guys have jinxed us, but we still love you,” Mullenweg wrote. “These past two rapid-fire incidents have been cringe-worthy and painful, and I’m sorry they both happened shortly after your switch.”
I am imagining that the guys at Techcrunch are just about due to switch hosting providers in an angry huff. Again. They’ve been with Mediatemple what, 5-6 months? A year for Rackspace before that, and what, a year on Mediatemple before that? I forget who they switched to from Mediatemple, as I didn’t follow te blog back then.
Maybe someday somebody will sit down with the semi-tech CEOs of the semi-tech companies who repeat variations on the same bad IT outsourcing mistakes. They would need to teach them that even if they outsource they need to be capable of understanding what it is they’re outsourcing. Perhaps some sobering worst-case scenarios to let them understand the capabilities and limitations of the
Or this could be a good opportunity to build “wordpressbackup.com” which would be a parallel service. The backend would ensure the syncing of data from wordpress.com to wordpressbackup.com via snapshots, or would be a database slave of the wordpress.com data store (for the paying customers) and would promote itself to master and initiate a global DNS based failover (keeping TTLs low) in response to a systemic failure on wordpress.com.
Sounds like WordPress might want to get their stuff together. With so many people relying on the site, it causes chaos when it’s down. Hopefully it doesn’t happen again for quite some time.
[...] is the piece about the recent wordpress.com outage. I’m not going to start on the roll out of obviously inadequately tested software (this is a [...]
This is my first time to visit your blog and I would say you share nice information.
I will surely bookmark your blog.
Thank you for sharing.
[...] Errant Code Change Crashes 10 Million Blogs | Data Center Knowledge [...]