The “big data” community will get a global ranking system for data applications. The BigData Top 100 will create a counterpart to the Top500, the supercomputing rankings that have generated enormous interest in high performance computing. Charter members of the group include Facebook and Google, illustrating the importance of massive data-crunching to the largest players in Internet infrastructure.
The project’s objective is to develop an end-to-end application-layer benchmark for big data applications to enable ranking of big data systems, using metrics for performance and efficiency that are developed through a collaboration of academic and industry experts.
The initiative was announced at the O’Reilly Strata Conference in Santa Clara, California this week. The San Diego Supercomputing Center will serve as the lead academic sponsor of the BigData100, while EMC Greenplum will lead the industry sponsors. Other launch participants include Facebook, Google, Mellanox, Seagate, Brocade, Oracle, NetApp and the University of Toronto.
Need for Benchmarks
“Big data is now part of every sector and function of the global economy, and the tremendous growth in data has created the need for benchmarks to quantify system performance and price/performance on big data tasks and applications,” said Chaitan Baru of the San Diego Supercomputing Center. “The existence of such benchmarks enables healthy competition among technology and solution providers, resulting eventually in product improvements and evolution of new technologies.”
That “healthy competition” can raise the profile of specialized computing. Just look at the Top500, which now serves as the arbiter of supercomputing bragging rights for nations, vendors and universities. The list made national headlines when a supercomputer from China took the top spot in 2012. Major vendors and universities all promote their performance in the twice-yearly list.
But there’s more than bragging rights at stake. “The goal of this activity is to provide clear objective information to help characterize and understand hardware and system performance and price/performance of big data platforms,” the group said. “The new big data benchmark should characterize the new feature sets, large data sizes, large-scale and evolving system configurations, shifting loads, and heterogeneous technologies of big data platforms.”
The effort has been spearheaded by the San Diego Supercomputing Center, which has organized several workshops on big data benchmarking. For more info, see the BigData Top 100 web site.