Dr. Robert Wisniewski is Chief Software Architect of Extreme Scale Computing, Intel.
The world of High Performance Computing (HPC) is evolving at a rapid rate. Currently there is a great deal of interest in achieving exascale computing. At the same time, we are looking forward, thus ensuring that the architectures and implementations we design and build will extend well beyond exascale, and will support the new computing models of big data, cloud, machine learning, and deep learning. A key aspect of achieving this future is have a solid and extensible infrastructure we can leverage as a base.
One of the challenges that the HPC community faced until recently was the duplication of time-consuming work. Vendors, OEMs, and customers were each building software stacks. Unfortunately, much of the effort in doing this was repeated across each of these organizations. A second challenge was that the HPC system ingredients were often designed and built separately. There were network providers, memory vendors, chip manufacturers, and a software stack that was not necessarily co-designed with any of them. Both of these challenges led to inefficiencies.
Tackling the HPC Challenges in an Open Source Community
To address the first issue, a cohesive and comprehensive system software stack was developed and made available as open source and the community is called OpenHPC. To address the second issue, a system-centric methodology was adopted. Software engineers worked with network, memory, and core teams to ensure that the benefits of the resulting technology could be realized in the software stack and delivered to the application.
OpenHPC was announced in November 2015 and launched in May 2016 by the Linux Foundation. It is an open community focused on HPC and is designed to allow different community members, including academic institutions, OEMs, ISVs, customers, and national labs, to contribute open source components and enhancements to those, with a focus on HPC. The community, which is platform agnostic, includes 30 member organizations that have come together to make contributions to HPC software.
OpenHPC provides a continuous integration and testing environment for all the components that are needed to develop for, run on, deploy, and manage HPC machines. The multitude of components are freely available for open source distribution and include provisioners, resource management, I/O clients, development tools, and scientific libraries, etc.
The desire to have a supported version of the OpenHPC software stack (an analogous example is CentOS and Red Hat Linux) led to the development of cohesive and comprehensive supported software stacks that is designed to run machines from a cluster up to a supercomputer. By removing the duplicated work, OEMs, ISVs, and customers can focus on providing value based on their core competencies leading to a more efficient HPC ecosystem. In the past, HPC researchers and developers shared new ideas through papers and online posts, but for many components it was challenging for different groups to cross pollinate code, but OpenHPC makes that option viable.
Provisioning HPC in the Cloud
It is clear that the face of HPC is changing. Many want the capability to bring together cloud and HPC, to comprehend big data and analytics, as well as machine learning and deep learning. Because OpenHPC allows the HPC community to develop together, it provides a solid and extensible base on top of which to explore the melding of the technologies referenced above. At the end of 2016, we developed a proof-of-concept (POC) that demonstrates how HPC can be used in conjunction with a cloud environment. For this POC, we set up a head node managing a cloud environment using OpenStack. We used OpenStack components, Nova and Ironic, to carve off a set of nodes and then Glance to provision the nodes, thus producing an HPC sub-cluster in the cloud environment.
We extended the base POC by combining the newly created HPC nodes with an HPC cluster and allowed a resource manager to schedule across both sets of nodes. We eventually ended up with a set of cloud nodes that were bare metal provisioned running a software stack optimized for an HPC workload. We are working on a similar POC for creating a machine learning stack based on OpenHPC.
As for what the future holds for us in exascale computing, the boundaries are limitless. I use the acronym PEZ since Petascale is established, Exascale is what we are currently focusing on achieving, and Zetascale is where we are heading. In fact, I walk around with a small PEZ dispenser to remind us of this important focus. HPC and HPC research has a tremendous set of new avenues it is opening to allow better, deeper, and more insightful science.
Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Penton.