Len Rosenthal is the vice president of marketing at Load DynamiX where he is responsible for corporate and product marketing.
Flash storage, or solid state drives (SSDs), is one of the most promising new technologies to affect data centers in decades. Like virtualization, flash storage will likely be deployed in every data center over the next decade. The performance, footprint, power and reliability benefits are too compelling. However, flash arrays come at a price, literally. As the importance of storage infrastructure has increased, so has the budget necessary to meet performance and capacity requirements. The cost of storage infrastructure can now consume up to 40 % of an IT budget and flash storage is not an inexpensive solution.
Despite vendor claims, flash arrays can run as much as 3X-10X the price of spinning media (HDDs). Informed IT managers and architects will know that the best solution to meet both application performance and budget demands will be a combination of using both technologies. The question remains, how do you know when and where to invest in each.
Below are two ways that every storage architect can go about analyzing their current and future requirements to understand which workloads will benefit from flash storage and which will perform better with HDD or a hybrid solution.
Characterize Your Application Workloads to Create a Workload Model
One of the smartest ways to understand your storage deployment requirements is to have an accurate model that represents your current storage I/O profiles or workloads. The goal here is to enable the development of a realistic-enough workload model to compare different technologies, devices, configurations, and even software/firmware versions that would be deployed in your infrastructure.
To effectively model these types of workloads, you’ll need to know the key storage traffic characteristics that have the biggest potential performance impact. For any deployment, it is critical to understand the peak workloads, specialized workloads-such as backups and end of month/year patterns- and impactful events such as login/logout storms.
There are some basic areas to consider when characterizing a workload.
- The description of the size, scope and configuration of the environment itself.
- The access patterns for how frequently and in what ways the data is accessed. Proper characterization of these access patterns will be different for file (NAS) and block (SAN) storage.
- The data types representative of the applications that use storage to understand how well pattern recognition operates in the environment.
- The load patterns over time. Understanding the environment itself differs for file and block storage. Each has unique characteristics that must be understood in order to create an accurate workload model. The load patterns help determine how much demand and load can fluctuate over time. In order to generate a real-world workload model, understanding how the following characteristics vary over time is essential. IOPs per NIC/HBA, IOPs per application, Read & Write IOPs, metadata IOPs, Read, Write, & total bandwidth, data compressibility and the number of open files are key metrics.
- The basic command mix, whether data is accessed sequentially or randomly, the I/O sizes, any hotspots, and the compressibility and deduplicability of the stored data. This is critical for flash storage deployments as compression and inline deduplication facilities are essential to making flash storage affordable.
There are a number of products and vendor-supplied tools that exist to extract this information from storage devices or by observing network traffic. This forms the foundation of a workload model that accurately characterizes workloads. The data is then input into a storage workload modeling solution.
Running & Analyzing the Workload Models
Once you have created an accurate representation of the workload model, the next step is to define the various scenarios to be evaluated. You can start by directly comparing identical workloads run against different vendors or different configurations. For example, most hybrid storage systems allow you to trade off the amount of installed flash versus HDDs. Doing runs, via a load generating appliance that compares latencies and throughput from a 5% flash / 95% HDD configuration versus a 20% flash / 80% HDD configuration, usually produces surprising results.
After you have determined which products and configurations to evaluate, you can then vary the access patterns, load patterns, and environment characteristics. For example:
- What happens to performance during the log-in/boot storms?
- During end of day/end of month situations?
- What if the file size distribution changes?
- What if the typical block size was changed from 4KB to 8KB?
- What if the command mix shifts to be more metadata intensive?
- What is the impact of a cache miss?
- What is the impact of compression and inline deduplication?
All of these factors can be modeled and simulated in an automated fashion that allows direct comparisons of IOPS, throughput and latencies for each workload. With such information, you will know the breaking points of any variation that could potentially impact response times.
Before deploying any flash or hybrid storage system, storage architects need a way to proactively identify when performance ceilings will be breached and how to evaluate the technology and product options for best meeting application workload requirements. Relying on vendor-provided benchmarks will usually be irrelevant as they can’t determine how flash storage will benefit your specific applications.
Workload modeling, combined with load generating appliances, is the most cost-effective way to make intelligent flash storage decisions and to align deployment decisions with specific performance requirements. There is a new breed of these solutions available on the market that can provide a workload modeling, load generation and decision management in a single 2 U chassis. These new technologies easily replace older in-house tools that involve purchasing dozens or even hundreds of servers and involve an infinite amount of man hours to reproduce these workloads under various network conditions.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.