Google Sorts 1 Petabyte of Data in 6 Hours
Google has rewritten the record book and perhaps extended the benchmark for sorting massive volumes of data. The company said Friday that it had sorted 1 terabyte of data in just 68 seconds, eclipsing the previous mark of 209 seconds established in July by Yahoo. Google’s effort included 1,000 computers using MapReduce, while Yahoo’s effort featured a 910-node Hadoop cluster.
Then, just for giggles, they expanded the challenge: “Sometimes you need to sort more than a terabyte, so we were curious to find out what happens when you sort more and gave one petabyte (PB) a try,” wrote Grzegorz Czajkowski of the Google Systems Infrastructure Team. “It took six hours and two minutes to sort 1PB (10 trillion 100-byte records) on 4,000 computers. We’re not aware of any other sorting experiment at this scale and are obviously very excited to be able to process so much data so quickly.”
Read more on the Official Google Blog.
Can you tell us if Google uses SQL server or Oracle for the Petabyte sort?
Thanks and congrats google!
That sounds awesome. However, I don’t think greenpeace would be happy to read the many computers running to complete the task : P
AxelPosted December 1st, 2008
Google uses their own proprietary Bigtable database system