Digg Downtime Debacle Debated

No, this isn't Digg's new home page. Not most of the time, anyway.

Does the recent downtime at social media hub Digg reflect challenges in deploying NoSQL databases like Cassandra? Or is it simply a case of a company launching a new site architecture before it was ready for prime time?

Digg has been embroiled in controversy since it unveiled its “Version 4” retooling last month, which prompted a revolt from power users unhappy with what they saw – assuming they could see it at all. The rollout of the new site has led to significant availability problems for the Digg.com web site.

As a result, users are getting plenty familiar with Digg’s new downtime placeholder graphic, which depicts a covered wagon with a broken axle (see above) which is already being compared with the Twitter Fail Whale as an icon of underperformance.

“Crashing Non-Stop”
This week Digg co-founder Kevin Rose addressed the site performance problems in an episode of the Diggnation podcast, saying the site had moved to a new architecture and not ironed out performance problems when it launched Digg v4. “Our service was falling over and crashing non-stop,” Rose said in the podcast. “It’s still crashing.”

Rose said Digg’s version 3 had reached the capacity of what the LAMP (Linux-Apache-MySQL-PHP) stack could handle, and planned to shift from MySQL to an architecture based on the “NoSQL” Cassandra data store. “We couldn’t take this architecture any further,” said Rose. “We hit the wall.”

But Rose’s comments also suggest the new Digg architecture may not have been stable. “Even up until days before the launch there were bugs with our datastore,” Rose said. “The plan was to get this live. We knew there would probably be bugs under load. So we launch the site, and it falls over.” While acknowledging that problems were not unexpected, Rose said the issues were “Cassandra problems.”

A Controversy for Cassandra?
What’s the fallout? There are reports that Digg’s VP or Engineering, who had championed Cassandra, has left the company. That in turn prompted a discussion thread at Hacker News about Cassandra deployments and Rose’s comments.

GigaOm spoke with Riptano, a company that specializes in Cassandra deployments and had worked with Digg, and said the issue isn’t the load. “We know Cassandra can scale to levels that are equal to or greater than a Digg is putting on it and I have full faith in Cassandra, but there are these little knobs that need to be tuned and you have to know where they are,” said Riptano CEO Matt Pfeil.

What about users? A significant number of disgruntled Digg uses have shifted their activities to Reddit, a rival social media site. The irony? As noted by Kevin Burton, the Reddit site is powered by Cassandra.

Get Daily Email News from DCK!
Subscribe now and get our special report, "The World's Most Unique Data Centers."

Enter your email to receive messages about offerings by Penton, its brands, affiliates and/or third-party partners, consistent with Penton's Privacy Policy.

About the Author

Rich Miller is the founder and editor at large of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

Add Your Comments

  • (will not be published)


  1. john

    hah. interesting..

  2. Bob Dole

    It's misleading to say Reddit is powered by Cassandra -- they use it as a persistent cache.

  3. Thanks, Bob Dole! For more on Reddit's use of (and challenges with) Cassandra, see this item on the Reddit blog.

  4. Jonathan Ellis

    ... and for a more recent update showing Reddit got past their growing pains, see this one: http://blog.reddit.com/2010/08/everything-went-better-than-expected.html

  5. Thanks for the pointer, Jonathan. Gotta love a chart titled "n00bs by date."

  6. Jason T.

    So I wonder why put your site at risk and piss off your advertising customers when you could have rolled out a development environment and hammered on it with load simulators to find these bugs? I understand the reason for keeping open source and not rolling out an oracle cluster but man... Impaling yourself on the bleeding edge is just suicide.