• Google Sorts 1 Petabyte of Data in 6 Hours

    November 24th, 2008 : Rich Miller

    Google has rewritten the record book and perhaps extended the benchmark for sorting massive volumes of data. The company said Friday that it had sorted 1 terabyte of data in just 68 seconds, eclipsing the previous mark of 209 seconds established in July by Yahoo. Google’s effort included 1,000 computers using MapReduce, while Yahoo’s effort featured a 910-node Hadoop cluster.




    Then, just for giggles, they expanded the challenge: “Sometimes you need to sort more than a terabyte, so we were curious to find out what happens when you sort more and gave one petabyte (PB) a try,” wrote Grzegorz Czajkowski of the Google Systems Infrastructure Team. “It took six hours and two minutes to sort 1PB (10 trillion 100-byte records) on 4,000 computers. We’re not aware of any other sorting experiment at this scale and are obviously very excited to be able to process so much data so quickly.”

    Read more on the Official Google Blog.

Dave

Posted December 1st, 2008

Can you tell us if Google uses SQL server or Oracle for the Petabyte sort?

Thanks and congrats google!

Peter van Dam

Posted December 1st, 2008

That sounds awesome. However, I don’t think greenpeace would be happy to read the many computers running to complete the task : P

Axel

Posted December 1st, 2008

Google uses their own proprietary Bigtable database system

Add Your Comments

    RESOURCE LINKS:

ARCHIVED ARTICLES