Why Working Sets May Be Working Against You

Pete Koehler is an Engineer for PernixData.

Lack of visibility into how information is being used can be extremely problematic in any data center, resulting in poor application performance, excessive operational costs, and over-investment in infrastructure hardware and software.

One of the biggest mysteries in modern day data centers is the “working set,” which refers to the amount of data that a process or workflow uses in a given time period. Many administrators find it hard to define, let alone understand and measure how working sets impact data center operations.

Virtualization helps by providing an ideal control plane for visibility into working set behavior, but hypervisors tend to present data in ways that can be easily misinterpreted, which can actually create more problems than are solved.

So how can data center administrators get the working set information they need in a manner that is most useful for proper planning, design, operations, and management?

What Is It?

For all practical purposes, working sets are the data most commonly accessed from persistent storage. But that simple explanation leaves a handful of terms that are difficult to qualify, and quantify. What is recent? Does “amount” mean reads, writes, or both? What happens when the same data is written over and over again?

Determining a working set’s size helps administrators understand the behaviors of workloads for better design, operations, and optimization. For the same reason administrators pay attention to compute and memory demands, it is also important to understand storage characteristics like working sets. Understanding and accurately calculating working sets can have a profound effect on the consistency of a data center. Have you ever heard about a real workload performing poorly, or inconsistently on a tiered storage array, hybrid array, or hyper-converged environment? This is because both are extremely sensitive to right sizing the caching layer. Not accurately accounting for working set sizes of the production workloads is a common reason for such issues.

To explore this more, let’s review a few traits associated with working sets:

Working sets are driven by the workload, the applications driving the workload, and the virtual machines (VMs) on which they run. Whether the persistent storage is local, shared, or distributed, it really doesn’t matter from the perspective of how the VMs see it. The size will be largely the same.
Working sets always relate to a time period. However, it’s a continuum, with cycles in the data activity over time.
Working set will be comprised of reads and writes. The amount of each is important to know because reads and writes have different characteristics, and demand different things from your storage system.
Working set size refers to an amount, or capacity. But how many I/Os it takes to make up that capacity will vary due to ever- changing block sizes.
Data access types may be different. Is one block read a thousand times, or are a thousand blocks read one at a time? Are the writes mostly overwriting existing data, or is it new data? This is part of what makes workloads so unique.
Working set sizes evolve and change as your workloads and data center change. Like everything else, they are not static.

A simplified, visual interpretation of data activity that would define a working set, might look like below.

If a working set is always related to a period of time, then how can it ever be defined? A workload often has a period of activity followed by a period of rest. This is sometimes referred to the “duty cycle.” A duty cycle might be the pattern that shows up after a day of activity on a mailbox server, an hour of batch processing on a SQL server, or 30 minutes compiling code. Taking a look over a larger period of time, duty cycles of a VM might look something like below.

Working sets can be defined at whatever time increment desired, but the goal in calculating a working set will be to capture one or more duty cycles of each individual workload at a minimum.

Classic Methods for Calculating Working Sets

There are various ways that administrators have attempted to measure working sets, all of which are ineffective for various reasons. These include:

Calculate working sets using known (but not very helpful) factors, such as IOPS over the course of a given time period. This is flawed, however, as it assumes one knows all of the various block sizes for that given workload, and that block sizes for a workload are consistent over time. It also assumes all reads and writes use the same block size, which is also not true.
Measure working sets at the array, as a feature of the array’s caching layer. This attempt often fails because it sits at the wrong location. It may know what blocks of data are commonly accessed, but there is no context to the VM or workload imparting the demand. Most of that intelligence about the data is lost the moment the data exits the host. Lack of VM awareness can even make an accurately guessed cache size on an array insufficient at times due to cache pollution from noisy neighbor VMs.
Take an incremental backup, and look at the amount of changed data. It seems logical, but this can be misleading because it will not account for data that is written over and over, nor does it account for reads. The incremental time period of the backup may also not be representative of the duty cycle of the workload.
Guess work. You might see “recommendations” that say a certain percentage of your total storage capacity used is hot data, but this is a more formal way to admit that it’s nearly impossible to determine. Guess large enough, and the impact of being wrong will be less, but this introduces a number of technical and financial implications on data center design.

As you can see, these old strategies do not hold up well, and still leaves the administrator without a real answer. A data center architect deserves better when factoring in this element to the design or optimization of an environment.

A New Approach

The hypervisor is the ideal control plane for measurement of a lot of things. Let’s take storage I/O latency as a great example. It doesn’t matter what the latency a storage array advertises, but what the VM actually will see. So why not extend the functionality of the hypervisor kernel so that it provides insight into working set data on a per VM basis?

By understanding and presenting storage characteristics such as block sizes in a way never previously possible, you can understand on a per VM basis the key elements necessary to calculate working set sizes. Furthermore, you can estimate working sets for each individual VM in a vSphere cluster, and/or estimate for VMs on a per host basis.

Once working set sizes have been established, it opens a lot of doors for better design and optimization of an environment. Here are some examples of what can be achieved:

Properly sized persistent storage in a storage array.
If using server side storage acceleration, you can size the flash and/or RAM on a per host basis correctly to maximize the offload of I/O from an array.
If replicating data to another data center, take a look at the writes committed on the working set estimate to gauge how much bandwidth you might need between sites.
Learn how much of a caching layer might be needed for hyper-converged environments.
Chargeback/showback. This is one more way of identifying the heavy consumers of your environment, and would fit nicely into a chargeback/showback arrangement.

Summary

Understanding and accurately accounting for working set sizes can make the difference between a successful design, implementation, and operation of the data center, or an environment that leaves you with erratic performance, and dissatisfied application owners and users. Accommodating working set sizes correctly will not only help with predictable application delivery, but may have significant cost savings by avoiding overprovisioning of data center resources.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Comments

Plain text