An Overlooked Problem: Dynamic Power Variations
June 21st, 2013 By: Industry Perspectives
4. Masking the Problem
Because the equipment that exhibit variations in power consumption may represent only a small portion of the total equipment in the data center or network room, the potential issues this equipment can cause is often overlooked. For instance, if just five percent of the equipment in a given server environment experiences power variation of 2:1, and the remaining equipment draws constant power, the resulting bulk power measurements at the main feed or power distribution unit (PDU) may only vary by just 2.5 percent. As such, an operator may be lead to believe there is no real power consumption issue, when in fact, it is just hidden.
Managing Dynamic Power Variation: Solutions
To alleviate the aforementioned problems data center and network room operators should become more aware of the potential for and results of dynamic power consumption. Below are a number of suggested ways to mitigate such issues.
1. Utilize Separate Branch Circuits for Each Server
Because every server is operating from a dedicated circuit, overloads and loss of redundancy cannot occur when separate branch circuits are provided to each server. While effective, this solution can be expensive and complex to deploy for small server systems, as it can require large numbers of branch circuits be utilized per rack. For example, a rack with dual corded 1U servers could require up to 84 individual circuit branches and utilize two separate circuit breaker panelboards. When larger servers, or blade servers, are used, this technique is more practical. Note this type of solution does not mitigate thermal problems, such as hot spots.
2. Establish Safety Margin Standards for Worst Case and Measure Compliance at Install or on an Ongoing Basis
Most data center and network room operators have standards for loading margins, which are typically expressed as a fraction of the full load branch circuit rating. Most often, these values fall between 60 and 80 percent of the branch rating, with values of 75 percent considered a reasonable balance between power capacity, cost, and availability. To verify compliance with the standard, actual branch circuit loads must be measured. However, problems with this approach can arise when systems exhibit dramatically varying power consumptions, as this will make it difficult to accurately confirm the computational load at the time of measurement. In an ideal situation, a heavy computational load would be placed on the protected equipment during measurement to ensure compliance during a worst case scenario.
Additionally, by keeping extensive inventories of what is equipment is connected to each branch circuit and measuring the potential sum total of the maximum load draw can help ensure that branch circuits do not suffer from overload (information regarding maximum load for various equipment is available from the individual equipment manufacturer). This type of inventory is commonplace in large data centers but is not practical in all installations, as it requires that operators know exactly what equipment is plugged into every branch circuit at all times. For small data centers and network rooms, where operators can more easily protect against accidental equipment movement, this approach isn’t necessary.
Establishing safety margins and continuously monitoring all branch circuits on an ongoing basis by an automatic monitoring system can be a third solution to mitigating issues caused by dynamic power variances. In this case, operators are alerted when branch loading begins to enter the safety margin area. For example, when using a 60 percent branch loading standard, alerts should be sent when the loading passes 60 percent. This safety margin is established to provide operators with significant advance warning of a problem area, allowing them to take corrective action before an over current condition occurs. This approach can also warn of impending loss of redundancy. The specific advantage of this method is that it is applicable to situations where users may, without the data center manager’s knowledge, install, move or plug equipment into a different outlet. This type of scenario usually occurs within a colocation facility or medium security data center, where various personnel will have access to the equipment. It is recommended that this method be used in conjunction with the aforementioned techniques.
3. Integrate a Data Center Management Solution
An additional method for ensuring protection against the problems caused by power variations is the use of data center infrastructure management (DCIM) software, which can monitor and report on the health and capacity status of the power and cooling systems and keep track of the various relationships between the IT gear and the data center or network room’s physical infrastructure.
DCIM can provide insight into which servers, physical and virtual, are installed in a given rack and which power path and cooling system it is associated with. This software can also help eliminate the potential risk of human error, a leading cause of downtime, which can take the form of IT load changes without accounting for the status and availability of power and cooling at a given location. Automating both the monitoring of DCIM information (available rack space, power, and cooling capacity and health) and the implementation of suggested actions greatly reduces the risk.
Dynamic power variation in IT loads is an increasingly important issue, one that can give rise to a number of physical infrastructure problems that can be detrimental to the overall continuity of a business. To mitigate the risks of potential server downtime, data center and network room operators should consider the above suggested steps for proper planning and monitoring.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
Ram NepalPosted June 22nd, 2013
Useful insight of the power consumption phenomenon in DC floor.
Great article, but I wanted to point out that you only list four of the five problems with dynamic power variations. Regardless, I will be sharing this article with my customers.
Great, summary. I could not agree more to the increased spread between idle and peak power of IT equipment and the need for ongoing measurements. We did some extensive work with UL on server energy efficiency and we see idle power as low as 30% of peak power on new servers. While average power consumption is fairly linear between idle and peak power based on CPU utilization you may want to remember that even at 10% utilization, 10% of the time the server draws peak power – 90% is idle – so it jumps up and down rapidly, which across a rack could lead to unexpected random looking breaker trips. I am happy to provide more details. (The rating is called PAR4 in case someone is interested)