Measuring Available Redundant Capacity

Jay Hartley is Chief Scientist at Modius, where he oversees product planning, development, sales engineering, installation, and customer support.

JAY HARTLEY
Modius

One of the key power usage metrics that I often find our customers requesting is Available Redundant Capacity (ARC). They don’t always ask for it using this name. More simply, they want to know “Where can I safely add new IT equipment without overloading and potentially bringing down my facility?”

When viewed from the rack, row, room, or building level (or even across a network of data centers at the enterprise level), ARC provides a simple way to answer the question.

Typically, most data centers don’t calculate ARC explicitly. Instead, operators set a simple alarm threshold on the Actual Load of each device. For example, if the power load reaches 50% on a device (or more often 40% when de-rating), then the device or the monitoring system will throw an alarm.

However, this simple approach to thresholding based on device power usage doesn’t effectively capture all the conditions of the broader power distribution system. There can be hidden capacity that allows for safe failover, even though simple device-level thresholding suggests otherwise.

Where Can I Add Load?

The goal of system ARC is to identify where you can handle additional load without sacrificing system redundancy. To calculate ARC for power of a device in a dual-feed situation, the calculation is simply:

ARC = {Device Capacity}/2 – {Actual Load}

In most cases, the Device Capacity will be de-rated to allow for some margin. In the case of power capacity, it is common to de-rate apparent power (kVA) capacity by 80%. ARC can also be expressed in real power (kW) if you know or can estimate the power factor of the load. It is even more important to de-rate the capacity in the case kW measurements to allow for potential load problems that could degrade power factor.

For operational alarming, calculate ARC continuously and alarm if it goes negative for any subsystem. For reporting, and determining where to install new equipment, be sure to use the minimum ARC measured over time, or equivalently, calculate ARC above with “Actual Load” replaced by “Maximum Actual Load.”

Below is an ARC-based dashboard in action:

An ARC-based dashboard. Click for large version.

Here, the top panel shows ARC calculated for 6 different data centers, along with a measure of cooling overhead in the same units, where available. The lower panel shows the drill down for one of the sites.

When calculating the overall ARC for devices in parallel, you can add the ARCs of the individual units. For instance:

UPS A has 10 kVA ARC
UPS B has 8 kVA ARC

Together, they have 18 kVA ARC. If we have another UPS pair supplying a different load, their ARC can be added to this set in order to get the system ARC. The rule of summing ARC for parallel systems doesn’t depend on the systems be redundant.

Calculating system ARC from the individual device ARCs in this way assumes that the capacities of both parallel components are the same. This is most often the case, but in the rare instance that it is not, then you have to total the actual load across the devices, and compare it to the (de-rated) capacity of the smaller device. This ensures that the most-limited device can handle the entire load.
Interestingly, it is possible to have a safely redundant system even though one of the individual devices has a negative ARC.

For example:

UPS A has 3 kVA ARC
UPS B has −2 kVA ARC

The net ARC of the system is a small but safely positive 1 kVA. In this case, even though one UPS is nominally overloaded according to the simple one-device threshold, either UPS can fail without dropping any load. Note that a simple single-device alarm threshold would show UPS B in alarm, and trigger a potentially costly load rebalancing.

Exact Redundancy Configuration Needed

In order to evaluate a negative component ARC, one does need to know the exact redundancy configuration. Thus, it is important to evaluate ARC for each subsystem and then roll it up to the data center as a whole.

When looking at the entire power chain, as with any capacity measurement, the system is limited by the weakest link in the chain. The component, or more accurately the collection of parallel components, with the smallest ARC will be the limit of the entire system. In the scenario above, if the PDUs downstream of the UPS have a collective ARC of 20 kVA, the load will still be limited to the 1kVA of the UPS.

Some questions may arise when the load is imbalanced, as in the UPS examples above. Such imbalances may arise because some of the load is not configured redundantly. We hope instead that some loads are switched, and simply not balancing themselves between the two power paths. The ARC calculation doesn’t depend on knowing such details. Of course, any non-redundant load will be dropped if it loses its power source; however, as long as the system ARC is positive you know that any redundant load will be protected regardless of which power source is lost.

In summary, the goal of system ARC is to identify where you can handle additional load without sacrificing system redundancy. With parallel equipment, you can total the ARC of all components if they have the same capacity rating. When looking at ARC along the power chain, the correct system value will be the minimum ARC of any one set of components.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Comments

Plain text