An Overlooked Problem: Dynamic Power Variations
June 21st, 2013 By: Industry Perspectives
Patrick Donovan is a Senior Research Analyst with Schneider Electric’s Data Center Science Center. He has over 18 years of experience developing and supporting critical power and cooling systems for Schneider Electric’s IT Business unit including several award-winning power protection, efficiency and availability solutions.PATRICK DONOVAN
Historically, the total electrical power consumed by IT equipment in data centers and network rooms has varied only slightly depending on computational load or mode of operation. However, once processors on notebook computers were re-designed to lengthen battery time – enabling laptop computer processor power consumption to be reduced up to 90 percent when lightly loaded – server processor design soon followed suit. As a result, newly developed servers with energy management capabilities can experience dramatic fluctuations in power consumption with workload level over time – causing a variety of new problems for the design and management of data centers and network rooms.
Once negligible (historically on the order of five percent), total power variation for a small business or enterprise server is now much greater. These fluctuations in power consumption can lead to unplanned and undesirable consequences in the data center and network room environment. Such problems include: tripped circuit breakers and overheating and loss of redundancy, creating entirely new challenges for the design and operation of data centers and network rooms.
Additionally, the growing popularity of cloud computing and virtualization has greatly increased the ability to utilize and scale compute power while in turn heightening the risk of physical infrastructure issues. In a virtualized environment, the sudden creation and movement of virtual machines requires careful management and policies that contemplate physical infrastructure status and capacity down to an individual rack level. Failure to do so could undermine the software fault-tolerance.
Data Center Virtualization and Magnitude of Dynamic Power Variation
Two decades ago, server power variation was mainly independent of the computational load placed on processors and memory subsystems. Most often, significant fluctuations were caused only by disk drive spin-up and fans. During this time, typical power variation was approximately five percent. In more modern processing equipment, however, new techniques to achieve low power states, such as changing the frequency of the clocks, moving virtual loads and adjusting the magnitude of the voltages applied to the processors to better match the workload in the non-idle state, have been deployed. Depending on server platform, power variation can be on the order of 45 to 106 percent – a significant increase from just twenty years ago. This type of dynamic power variation gives rise to the following five types of problems.
1. Branch Circuit Overload
Typically, servers operate at light computational loads, with actual power draw amounting to less than the server’s potential maximum power draw capabilities. However, because many data center and network managers can be unaware of this power use discrepancy, they often plug more servers than are necessary into a single branch circuit. This in turn creates the potential for possible circuit overloads, as the branch circuit rating can be exceeded by the total maximum server power consumption. While the servers will operate successfully at lower loads, when servers are simultaneously subject to heavy loading, overloads will occur. The most significant result of branch circuit overload is the tripping of the circuit, which will shut off power to the computing equipment. In general, these instances are undesirable, and since they occur during periods of high workload, they can be extremely detrimental to business continuity.
In the data center or network room, most electrical power consumed by computing equipment is released as heat. When the power consumption varies due to load, the heat output also varies. As such, sudden fluctuations in power consumption can cause dangerous increases in heat production, creating heat spots. While data center cooling systems are put in place to regulate overall temperature, they may not be designed to handle specific, localized hot spots caused by increases in power consumption. As temperature rises, equipment is likely to shut down or act abnormally. Furthermore, even if equipment functionality remains, heat spikes may effect equipment over time or void any warrantees.
Hot spots can also occur in a virtualized environment, where servers are more often installed and grouped in ways that create localized high-density areas. While this problem may be surprising due to the virtualized machine’s inherent ability to dramatically decrease power consumption, the act of grouping or clustering these high density virtualized servers can result in cooling problems.
3. Loss of Redundancy
To protect against potential power failure, many servers, data centers and network rooms utilize dual redundant power inputs that are designed to share power loads equally between two paths. When one path fails, the load once supported by the failed feed is then transferred to the active power feed – causing the feed’s load to double in order to fully support the server. In order to ensure that a remaining feed has the capacity to take over the complete load, if necessary, the main AC branch circuits feeding the equipment must always be loaded to less than 50 percent ampacity. However, this can be difficult when the loads are experiencing variations in power consumption – equipment that initially rated as less than 50 percent during installation can, over time, begin to operate at much higher loads.
Should the inputs begin operating at greater than 50 percent of their rating, the system’s redundancy, and protection capabilities are eliminated. In this case, should one feed fail, the second will overload, the breaker will be tripped and power lost, causing data lose or corruption.
Ram NepalPosted June 22nd, 2013
Useful insight of the power consumption phenomenon in DC floor.
Great article, but I wanted to point out that you only list four of the five problems with dynamic power variations. Regardless, I will be sharing this article with my customers.
Great, summary. I could not agree more to the increased spread between idle and peak power of IT equipment and the need for ongoing measurements. We did some extensive work with UL on server energy efficiency and we see idle power as low as 30% of peak power on new servers. While average power consumption is fairly linear between idle and peak power based on CPU utilization you may want to remember that even at 10% utilization, 10% of the time the server draws peak power – 90% is idle – so it jumps up and down rapidly, which across a rack could lead to unexpected random looking breaker trips. I am happy to provide more details. (The rating is called PAR4 in case someone is interested)