Has the performance metric used to rank the world’s top supercomputers become dated ? With so much emphasis and funding invested in the Top500 rankings, the 20-year old Linpack benchmark has come under scrutiny, with some in the community suggesting it needs to evolve. Now even University of Tennessee professor Jack Dongarra, who helped found the Top500 list, believes it is time for a change.
Dongarra and his colleague Michael Heroux from Sandia National Laboratories are developing a new benchmark that is expected to be released in time for the next TOP500 list release in November. The new benchmark being proposed is called the High Performance Conjugate Gradient (HPCG), and should better correlate to computation and data access patterns found in many applications today. The HPCG won’t replace Linpack, but both metrics will be used to evaluate contenders in the November Top500.
The primary objective of the Top500 list of the top supercomputers in the world is to provide a ranked list of general purpose systems that are in common use for high end applications. The list has been released twice a year for the past twenty years, with Linpack serving as the standard yardstick of performance. The High Performance Linpack (HPL) was introduced by Dongarra and selected for the Top500 in 1993 because it was widely used and performance numbers were available for almost all relevant systems. It measures the ability of a system to solve a dense system of linear equations.
Designing for a Benchmark, or Applications?
The performance measurement for Linpack is FLOPS, short for Floating Point Operations Per Second. On the very first Top500 list the Los Alamos National Laboratory CM-5 supercomputer ranked number one, posting a 59.7 gigaflops performance. Twenty years later the top spot was awarded to China’s Milky Way-2, with 33.86 petaflops performance. Among many other performance metrics, memory, storage and interconnect advances, vendor changes and other things, the evolution of gigaflops to teraflops and then petaflops has led many to speculate on what it will take to achieve exaflop levels of performance.
In the Sandia National Laboratories report that Dongarra and Heroux released it lists an example of how Linpack has lost its relevance.
“The Titan system at Oak Ridge National Laboratory has 18,688 nodes, each with a 16-core, 32 GB AMD Opteron processor and a 6GB Nvidia K20 GPU,” the report notes. “Titan was the top ranked system in November 2012 using HPL. However, in obtaining the HPL result on Titan, the Opteron processors played only a supporting role in the result. All floating-point computation and all data were resident on the GPUs. In contrast, real applications, when initially ported to Titan, will typically run solely on the CPUs and selectively off-load computations to the GPU for acceleration.”
“We have reached a point where designing a system for good Linpack performance can actually lead to design choices that are wrong for the real application mix, or add unnecessary components or complexity to the system,” said Dongarra. “The hope is that this new rating system will drive computer system design and implementation in directions that will better impact performance improvement for real applications.”
While Linpack was not able to adapt and measure the more complex computations, Dongarra believes the new benchmark will adapt to emerging trends. The HPCG measurement will debut this November at the Supercomputing Conference (SC2013) in Denver, Colorado. The Linpack will not be laid to rest though – HPCG will serve as a companion ranking of the Top500 list, in a similar fashion to how the Green 500 re-ranks the Top500 according to energy efficiency. The HPCG metric will continue to be developed and go through verification processes and have extensive validation testing performed against real applications on existing and emerging platforms.
Performance for Real-World Applications
The poster child in the debate over Linpack has been the NCSA supercomputer Blue Waters at the University of Illinois, which was brought online earlier this year. The Cray system posted an impressive 11.6 petaflops performance, but has not been submitted to the Top500 for ranking consideration. Deputy Project Director for Blue Water William Kramer has been one of the critics of the Top500 Linpack benchmark.
Writing in November 2012 Kramer said that the Top500 list “does not provide comprehensive insight for the achievable sustained performance of real applications on any system.” Noting that Blue Waters would not be submitted for Top500 evaluation, with the blessing of its NSF funding source, Kramer outlined some issues and opportunities for improvement with the Linpack benchmark and measurements, and other perceptual and usability issues with systems that are submitted to the Top500 list.
Kramer joins others that have lamented about the worth of measuring HPC systems against Linpack. In a recent RFI the Intelligence Advanced Research Projects Activity (IARPA) noted that the general value of benchmarks were necessary metrics, but that HPC benchmarks have “constrained the technology and architecture options for HPC system designers.”
MathWorks founder Cleve Moler compared Linpack to home runs in baseball. Home runs don’t always decide the result of a baseball game, or determine which team is the best over an entire season – but they are interesting to track over the years.