Understanding the Economics of HPC in the Cloud

The same improvements virtualization and cloud have brought to traditional data centers are coming to the world of high-performance computing.

HPC in the cloud was a major discussion topic at last month's SC15 conference on supercomputing in Austin. Diane Bryant, senior VP and general manager of Intel’s Data Center Group discussed new types of products ranging from high-end co-processors for supercomputers to Big Data analytics solutions and high-density systems for the cloud. An SC15 paper discusses the rising popularity of cloud for scientific applications.

HPC users have approached cloud computing cautiously due to performance overhead associated with virtualization and interference caused by multiple VMs sharing physical hosts. Recent developments in virtualization and containerization have alleviated some of these concerns. However, the applicability of such technologies to HPC has not yet been thoroughly explored. Furthermore, scalability of scientific applications in the context of virtualized or containerized environments has not been well studied.

Cloud Computing Distribution and Scale

Scale-out architecture has become a hot topic in the HPC community. Resource utilization for high-end workloads running on supercomputer technology requires very careful resource management. One big aspect of this is Big Data. Many organizations are virtualizing HPC clusters to be able to span out their environment into a hybrid cloud platform. Doing this without powerful automation and connectivity technologies can be cumbersome. Technologies like VMware vCloud Automation Center, when coupled with the VMware vCAC API, allow organizations to scale their platforms from a private cloud to outside public cloud resources. With such optimization, replication and resiliency become much easier to control for a vHPC platform.

Now, let’s look at a few use cases where HPC workloads can live in a virtual and cloud-ready environment.

Server platforms. There are a lot of new types of server platforms and systems being developed specifically for HPC and parallel workload processing. Most of all, these systems are now virtualization and cloud ready. Ever hear of HP’s Moonshot platform? Here’s a chassis that shares power, cooling, management, and fabric for 45 individually serviceable hot-plug server cartridges. What’s it perfect for? Running cloud-based applications capable of handling a large numbers of parallel task-oriented workloads. Now imagine deploying a virtual platform on top of this type of server architecture. Imagine being able to better control resources and migrate your data. These new types of server platforms are lending themselves to more optimization and better utilization.
There a number of new types of workloads being run on top of a vHPC platform. Everything from big data to life science applications can be found on some type of HPC system. Whether you’re doing a geological study, design automation, or quantifying large data sets, optimization and data resiliency are critical. Through it all virtualization introduces a new paradigm to consider.
Traditional HPC clusters run a single standard OS and software stack across all nodes. This uniformity makes it easy to schedule jobs, but it limits the flexibility of these environments, especially in cases where multiple user populations need to be served on a single shared resource. There are many situations in which individual researchers or engineers require specific software stacks to run their applications. For example, a researcher who is part of a scientific community that has standardized their software environments to facilitate easy sharing of programs and data. To prevent islands of compute, HPC virtualization allows researchers to “bring their own software” onto a virtualized HPC cluster. Basically, vHPC enables the creation of shared compute resources while also maintaining the ability for individual teams to fully customize their OS and software stack.

We are beginning to see new applications and deployment methods for HPC applications and workloads. However, before everyone begins to migrate their HPC environment to a virtual ecosystem, there are a few things to be aware of.

It’s important to understand where HPC and even cloud-ready virtual HPC environments have limitations and cost concerns.

Resource utilization and scale. Islands of compute can become a real problem for HPC environments. Organizations – often academic institutions – have many islands of HPC due to either departmental boundaries (commercial) or the mechanics of grant funding, which gives researchers money to support hardware for their research. This is an inefficient use of resources and one that virtualization and cloud can sometimes fall victim to as well. Although you can consolidate your resources, it’s critical to know where HPC resources are being used and how. “Resource sprawl” can negatively impact vHPC economics and prevent proper scale. This happens when the same control islands and policies are transferred to a virtual environment.
The critical nature of data, and the HPC workloads running these data sets, must be agile and capable of scale. Here’s an example: without virtualization and cloud expansion capabilities, quickly flexing the amount of resource available to individual researchers can be a real challenge. With virtualization, resources can be very rapidly provisioned to a new (virtual) HPC cluster for a researcher rather than having to order gear. However, not every workload is designed to be virtualized or delivered via cloud. You can negatively impact agility by virtualizing an HPC workload which needs dedicated on-premise resources. Remember, the same principles of traditional server virtualization don’t often equate to HPC virtualization.
Density and consolidation. Consolidation is very common in enterprise IT environments because those applications are generally not that resource-intensive. This consolidation allows customers to reduce their hardware footprint while still meeting their QoS requirements. By contrast, HPC administrators generally never want to reduce the amount of hardware their workloads are utilizing. In fact, they are always looking for more hardware to allow them to run bigger problems or to generate better answers for existing challenges. And, because this is High Performance Computing, these workloads will almost never be over-subscribed onto hardware. They will not run more vCPUs than there are physical cores in the system. This means that some of your systems will simply require non-virtualized parallel-processing capabilities. Administrators must know when to virtualize and which workloads require additional levels of scale rather than processing power.
Money (and budgets). Organizations are still concerned about how much money is being spent on physical systems and how best to spend the money to maximize value to end-users. Often HPC sites -- especially academic ones -- buy as much gear as they can and then rely on cheap labor to handle the operational complexities. But is this the best approach? Could putting in place a “better” software infrastructure be a more optimal use of funds? The answer is "yes and no." Not every HPC workload is meant to be virtualized. This means that if an organization sets to place an HPC application into a vHPC ecosystem they need to make sure that this will actually optimize the entire process. Otherwise, performance could be negatively impacted, processing could take longer, and resources wouldn’t be utilized properly, all of it leading to more cost.

A recent 451 Research study showed that the average cost of running an AWS multi-service cloud application is $2.56 per hour, or just $1,865 per month, which includes bandwidth, storage, databases, compute, support, and load balancing in a non-geographically resilient configuration. At this hourly price for an application that potentially could deliver in excess of 100,000 page views per month, it's easy to see how cloud is a compelling proposition. However, when we look at HPC, the conversation shifts to different concerns. As the HPC report from SC15 states, such virtualization comes with an associated performance overhead. Second, virtual instances in the cloud are co-located to improve the overall utilization of the cluster, and co-location leads to prohibitive performance variability.

In spite of these concerns, cloud computing is gaining popularity in HPC due to high availability, lower queue wait times, and flexibility to support different types of applications (including legacy application support).

Comments

Plain text