Digg Downtime Debacle Debated
Does the recent downtime at social media hub Digg reflect challenges in deploying NoSQL databases like Cassandra? Or is it simply a case of a company launching a new site architecture before it was ready for prime time?
Digg has been embroiled in controversy since it unveiled its “Version 4″ retooling last month, which prompted a revolt from power users unhappy with what they saw – assuming they could see it at all. The rollout of the new site has led to significant availability problems for the Digg.com web site.
As a result, users are getting plenty familiar with Digg’s new downtime placeholder graphic, which depicts a covered wagon with a broken axle (see above) which is already being compared with the Twitter Fail Whale as an icon of underperformance.
This week Digg co-founder Kevin Rose addressed the site performance problems in an episode of the Diggnation podcast, saying the site had moved to a new architecture and not ironed out performance problems when it launched Digg v4. “Our service was falling over and crashing non-stop,” Rose said in the podcast. “It’s still crashing.”
Rose said Digg’s version 3 had reached the capacity of what the LAMP (Linux-Apache-MySQL-PHP) stack could handle, and planned to shift from MySQL to an architecture based on the ”NoSQL” Cassandra data store. “We couldn’t take this architecture any further,” said Rose. “We hit the wall.”
But Rose’s comments also suggest the new Digg architecture may not have been stable. “Even up until days before the launch there were bugs with our datastore,” Rose said. “The plan was to get this live. We knew there would probably be bugs under load. So we launch the site, and it falls over.” While acknowledging that problems were not unexpected, Rose said the issues were “Cassandra problems.”
A Controversy for Cassandra?
What’s the fallout? There are reports that Digg’s VP or Engineering, who had championed Cassandra, has left the company. That in turn prompted a discussion thread at Hacker News about Cassandra deployments and Rose’s comments.
GigaOm spoke with Riptano, a company that specializes in Cassandra deployments and had worked with Digg, and said the issue isn’t the load. “We know Cassandra can scale to levels that are equal to or greater than a Digg is putting on it and I have full faith in Cassandra, but there are these little knobs that need to be tuned and you have to know where they are,” said Riptano CEO Matt Pfeil.
What about users? A significant number of disgruntled Digg uses have shifted their activities to Reddit, a rival social media site. The irony? As noted by Kevin Burton, the Reddit site is powered by Cassandra.
johnPosted September 8th, 2010
Bob DolePosted September 8th, 2010
It’s misleading to say Reddit is powered by Cassandra — they use it as a persistent cache.
Jonathan EllisPosted September 8th, 2010
… and for a more recent update showing Reddit got past their growing pains, see this one: http://blog.reddit.com/2010/08/everything-went-better-than-expected.html
Thanks for the pointer, Jonathan. Gotta love a chart titled “n00bs by date.”
Jason T.Posted September 9th, 2010
So I wonder why put your site at risk and piss off your advertising customers when you could have rolled out a development environment and hammered on it with load simulators to find these bugs?
I understand the reason for keeping open source and not rolling out an oracle cluster but man… Impaling yourself on the bleeding edge is just suicide.
[...] have been some recent discussions around the struggles Digg has had with implementing Cassandra, and it’s led some to compare Digg’s efforts with [...]