Dept. of Energy Awards $300M Deal for IBM Supercomputers

The systems will be powered by IBM's OpenPOWER CPUs and NVIDIA GPUs and be five times faster than the current-generation Titan supercomputer.

Jason Verge

November 14, 2014

4 Min Read
Dept. of Energy Awards $300M Deal for IBM Supercomputers
Dept. of Energy’s Titan supercomputer, one of the HPC systems housed at one of the Oak Ridge National Lab’s data centers, took second place on the 2012 Top500 list (Photo: DOE)

IBM has won a $300 million supercomputing contract with the U.S. Department of Energy. Lawrence Livermore National Lab’s “Sierra” and Oak Ridge National Lab’s “Summit” supercomputers will leverage IBM's OpenPOWER processor architecture and are being hailed as a step toward exascale computing.

Peak performance of the IBM supercomputers will be in excess of 100 petaflops, balanced with more than 5 petabytes of dynamic and flash memory. The systems will be capable of moving data at more than 17 petabytes a second. That’s equivalent to moving over 100 billion photos on Facebook in a second. It will be live in 2017.

It will perform five to 10 times better than the current Titan supercomputer at Oak Ridge. It will also be five times more energy efficient, using roughly 10 percent more power than the current system.

A "Data Centric" Approach

A consortium of IBM, NVIDIA, and Mellanox are developing the Summit machine architecture for next-generation supercomputing. The architecture will enable a smaller number of nodes with a larger memory footprint, optimized for parallel codes. This contract involves three of the five founding members of the Open Power foundation, and the labs will leverage this architecture.

“In data centric computing, the value is not tied to only petaflops, but speed of insights,” said Tom Rosamilia, senior vice president, IBM Systems and Technology Group. The goal is to limit data movement within the latest IBM supercomputers.

The current model requires data repeatedly moving back and forth from storage to processor to drive insights. Design emphasis solely on microprecessors becomes progressively untenable. For this reason IBM has been pioneering the “data centric” approach, which embeds compute power everywhere data resides in the system.

Oak Ridge’s Summit will be used to work with the nuclear energy industry to further optimize reactors fleet and perform climate modeling. "Systems like summit allow us to inject greater amounts and variety of data in new ways we’ve not been doing with Titan," said Jeffrey Nichols, associate laboratory director of computing and computational sciences at the lab.

“Systems like Summit allow us to inject greater amounts and variety of data in new ways we’ve not been doing with Titan,” said Nichols. “These are early steps towards exascale. We believe we have a good path going forward.”

New System to Monitor Nuclear Stockpile

Lawrence Livermore’s machine will be called Sierra. The lab runs some of the most complicated calculations on the planet, with codes running easily over a million lines. Key national security decisions are based on these calculations, including assessment of all stockpile systems and life extension of weapons.

“Simulation is the integrating element in our program that makes it possible for the country to not return to nuclear testing in Nevada,” said Mike McCoy, head of advanced simulation and computing program at Lawrence Livermore National Laboratory.

“How do we assure they do the job for the country?” asked McCoy, “We are not buying off the shelf. We engage in long-term relationships…We share the risk in the development.”

IBM Research will work with Lawrence Livermore and Oak Ridge on scientific collaboration centered on these systems and help develop tools and technologies to optimize codes to achieve the best performance on the acquired systems.

NVIDIA GPUs to Supercharge the System

NVIDIA brings three technologies to the fold. The first is upcoming GPU architecture called Volta. Volta incorporates another technology called NVLink and a very high bandwidth-stacked memory.

Sumit Gupta, general manager of Tesla accelerated computing at NVIDIA, said it can achieve up to 40 petaflops per node per server, or roughly 10 times more than a system today.

“In three years, we’ll create a server with 10 times performance,” Gupta said. “Summit will be five times as powerful as Titan but at a fifth the size.”

NVLink is an interconnect for GPU, allowing a point-to-point connection between GPUs or between GPU and Power CPU. Co-developed by IBM, it increases data flow. It will first be introduced into products in 2016.

Mellanox is providing the inter-host communication network, network management and InfiniBand. “We’re enhancing network capabilities to enable extreme-scale computer systems,” said Richard Graham, senior solutions architect at Mellanox. “Our goal is to reduce overall power consumption network by 50 percent, 80 percent in the longer term.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like