• Google Sorts 1 Petabyte of Data in 6 Hours

    Google has rewritten the record book and perhaps extended the benchmark for sorting massive volumes of data. The company said Friday that it had sorted 1 terabyte of data in just 68 seconds, eclipsing the previous mark of 209 seconds established in July by Yahoo. Google’s effort included 1,000 computers using MapReduce, while Yahoo’s effort featured a 910-node Hadoop cluster.

    Then, just for giggles, they expanded the challenge: “Sometimes you need to sort more than a terabyte, so we were curious to find out what happens when you sort more and gave one petabyte (PB) a try,” wrote Grzegorz Czajkowski of the Google Systems Infrastructure Team. “It took six hours and two minutes to sort 1PB (10 trillion 100-byte records) on 4,000 computers. We’re not aware of any other sorting experiment at this scale and are obviously very excited to be able to process so much data so quickly.”

    Read more on the Official Google Blog.

    About

    Rich Miller is the founder and editor-in-chief of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

  • Sign up for the Data Center Knowledge Newsletter

    Get daily email alerts direct to your inbox.

    Dave

    Posted December 1st, 2008

    Can you tell us if Google uses SQL server or Oracle for the Petabyte sort?

    Thanks and congrats google!

    Peter van Dam

    Posted December 1st, 2008

    That sounds awesome. However, I don’t think greenpeace would be happy to read the many computers running to complete the task : P

    Axel

    Posted December 1st, 2008

    Google uses their own proprietary Bigtable database system

    Add Your Comments

      RESOURCE LINKS:

Sign up for the Data Center Knowledge Newsletter

Get daily email alerts direct to your inbox.

ARCHIVED ARTICLES