Optimizing Data Center Efficiency for Peta-Scale Applications

Peta-scale applications like AI/ML, financial applications and edge computing architectures are everywhere today, and not just in large organizations. They are driving data volumes to unprecedented levels and as a result, require enormous amounts of storage, and a plan for how to continually assure performance without consuming the entire IT budget.

Many IT leaders presume that the most cost-effective storage option for them is solid state drives (SSDs) versus hard disk drives (HDD).

But is that really true? As any data center operator knows, power, cooling/heating, real estate, and other total cost of ownership (TCO) considerations are factors beyond the upfront software and storage media costs that must be considered.

Analyses from a variety of drive and storage device vendors, as well as analysts and consultants, show that SSDs do not always provide an edge — even with the boost from innovations such as quad-level cell (QLC) flash technologies and improved storage density.

Let’s look at three categories of applications and the considerations for using HDD vs. SSD for each of them.

Application #1: Latency-Sensitive Analytics

Flash Is Best When:

Flash SSDs are often a great fit for applications requiring random access to small data payloads. For example, transactional systems that execute arbitrary queries against a product ordering system and look up customer records using a key — like the customer's name or phone number before performing the next step in the chain can benefit greatly from SSD’s lower latency. So can edge applications where small IoT or device sensor event streams carry kilobytes or less per record of event data- especially when workloads are at scale.

Flash SSDs can be more advantageous than HDDs in these situations because of their lower latency and higher operations per second (IoPs). In these cases, QLC flash, which offers twice as much density as present triple level cell (TLC) flash, can offer meaningfully better performance.

When HDD May Be a Better Option:

IT leaders cannot presume that every latency sensitive application requires SSDs. Enterprises often find they can gain more than adequate performance by running high IoPs, latency-sensitive workloads on HDD-based systems, particularly those with enterprise-grade HDDs optimized for server and datacenter workloads. Data archiving use cases are a good example here.

In such applications as archiving, backup and media asset management, the difference between millisecond- and microsecond-level latency often does not impact end user performance. In these situations, HDD can be a better choice because it satisfies the constant IT balancing act of meeting or exceeding performance expectations while still adhering to a budget.

Application #2: TCO Considerations With Unstructured Data at Petabyte Scale

Industry debate has ensued recently over whether high-density flash SSDs or HDDs are the better choice for storage in unstructured data applications such as rich media files or sensor data. SSD evangelists have suggested that the most recent versions of flash are practically "on par" with HDDs with respect to capacity cost. Some voices even forecast that high-density flash SSDs will soon replace HDDs since they can do all tasks better.

Yet today high-density SSDs can't completely replace HDDs in terms of price/performance, particularly when it comes to petabyte-scale unstructured data storage across the gamut of application workloads. An ideal mix of performance, long-term durability, capacity and affordability is often possible by combining flash and HDD to achieve the benefits of each. It’s really about leveraging the strengths of each for specific circumstances and choosing the appropriate storage media that will provide the best fit for each workload.

HDDs can be ideal when it comes to providing reliability at massive scale, as underscored by the fact that 90% of storage capacity in cloud data centers is still HDD-based today.

Application #3: Secondary Storage or Backup

In general, backup applications read and write bigger file payloads to storage. These workloads are almost the exact opposite of random IO, latency-sensitive workloads with respect to the performance demands they place on the storage system.

Backup applications perform best when they have rapid sequential access to huge backup data files, with throughput in gigabytes per second (or terabytes per hour). Also, since most organizations today have hundreds of mission-critical applications, resources must handle several backup and restore tasks concurrently, and in parallel. Using shared storage systems to prevent the propagation of traditional storage silos makes financial sense.

The distinction between a QLC-flash and HDD-based solution is negligible for this type of sequential IO workload. HDD-based object storage solutions are capable of achieving dozens of GB/second (tens of terabytes per hour), with enough throughput to saturate the network. This is crucial because the performance limiter here isn't the storage system, but the network.

Additionally, as application processing, dedupe/compression and data reassembly times emerge as key time considerations for backup and restore, the program itself may be the limiting factor in overall solution performance. As a result, the incremental difference in throughput between HDD and flash SSD can usually be considered insignificant, particularly when cost is factored into the analysis.

Bottomline: The Choice to Use HDD or SSD Depends on a Balance of Performance vs. Cost

Optimizing the performance of peta-scale applications can depend on subtle yet important considerations in matching the right storage for application demands. QLC flash isn’t a one-size-fits-all option. For latency-sensitive, read-intensive workloads, its higher cost can deliver meaningful end user benefit. However, it’s not always the right fit for other types of workloads, including backups, which are at the core of modern data and ransomware security strategies.

By matching the performance pattern to the advantages and costs of the medium, data center teams can strike the perfect balance between performance and cost.

About the Author:

Wally MacDermid is vice president of strategic alliances at Scality, a hardware-agnostic storage software firm whose solutions help organizations build reliable, secure and sustainable data storage architectures. An executive leader who has held diverse technical, strategic alliance and business development roles in both startups and large organizations, Wally has spent 20 years building solutions with partners worldwide that maximize customer results. His depth of expertise in cloud, storage, and networking technologies helps organizations solve their biggest data center challenges — growth, security and cost.

Comments

Plain text