Debunking Data Center Power Myths
June 26th, 2012 By: Industry Perspectives
Jeff Klaus is the director of Intel Data Center Manager (DCM) Solutions. Jeff leads a global team that designs, builds, sells and supports Intel DCM, a software SDK that plugs in to Data Center Infrastructure Management (DCIM) software.
A data center manager recently raised a seemingly simple question in an online discussion thread: “How do I calculate per rack power consumption in my data center?” The forthcoming advice highlighted that there are as many answers to this question as there are vendors providing data center solutions.
There is a lot at stake – energy is an expensive and increasingly limited resource in the data center. Let’s look at some of the pros and cons for the most commonly suggested power calculation methods, so we understand the facts, and debunk the myths, surrounding this topic.
De-rating Manufacturers’ Maximums
Could power calculation be as simple as adding up the maximum requirements specified by equipment vendors? These manufacturers are held liable for the accuracy of their data, and so they invest in careful measurements and analysis in order to publish power specifications. A variety of tools are available for combining vendor data to match your data center design.
However, the conservative nature of this method of data center power calculation is also its drawback. The manufacturers’ ratings are typically worst-case estimations and often lead to over-provisioning of an expensive resource unless they are accurately de-rated. Opinions vary widely about the appropriate de-rating factor. In the above mentioned discussion thread, data center managers cited 20 to 50 percent de-rating, as common practice. And, it was noted that new equipment draws more power than “burned-in” equipment, meaning that the manufacturers’ specifications are less accurate as the equipment ages.
Data Center Power Meters
Those data center managers who don’t want to trust vendor data, or who wish to determine how to accurately derate published specifications, recommend taking actual power measurement. Certainly this yields results that are specific to your equipment, configurations and workloads. However, the manual process is time-consuming and therefore expensive.
Data center managers also pointed out the challenge of knowing where to take measurements. To save time, some take measurements for each row of racks. To maximize accuracy and gain the best baseline information, others recommend taking measurements at each rack or even for each server. Measurements must also be repeated, to keep up with changes to equipment and facilities, but just how often to take measurements appears to be yet another nebulous detail. Manual measurements also leave data center managers guessing about future requirements, workload variability and scalability issues.
Intelligent Data Center Power Distribution Units
Intelligent power distribution units (PDUs) are being deployed in many data centers today, for a variety of reasons. For those data centers with the budget for purchasing and managing an additional hardware layer, PDUs provide a stream of power data that reflects the current data center layout and workloads. As an added benefit, PDUs introduce a scheduling function that helps avoid spikes during periods of peak activity.
In addition to the added costs, however, PDUs require time for data collection and analysis just like manual power meter measurements. The data provided constitute snapshots in time, and can miss power spikes.
These methods are entirely focused on power, and do not take into account the additional data available from today’s servers. For example, most servers shipping after 2007 provide real-time power information and inlet temperature readings. Power will inevitably vary over time and must be managed as a dynamic resource as it relates to other environmental variables.
Power is not everything. Airflow parameters and temperatures throughout the data center and at the computer-area air handler (CRAH) equipment can also contribute to a better overall picture of the data center energy requirements over time.
With current energy costs at historically high levels and some utility companies unable to provide enough power for the largest data centers, IT and facilities managers need a more holistic approach for managing power. An ideal solution would provide a dynamic view of the data center, in terms of power, temperature and airflow, and would be extensible to keep pace with changes to the infrastructure and workloads.
A Better Question
Instead of focusing on how to calculate power, data center managers should raise a different question. How can we manage power consumption, airflow and temperature levels to maximize equipment-friendly conditions in the data center? This question will lead to methods and practices that can optimize energy efficiency as part of the bigger goal that extends the life of data center infrastructure and avoids conditions that can damage equipment and potentially disrupt business processes.
Answering the bigger question will also lead to an evaluation of the technology advances relating to holistic power management in the data center. These include middleware platforms that can pay for themselves by increasing efficiencies, avoiding over-provisioning and outages, and managing disaster recovery conditions. Besides automating the collection and aggregation of all available energy-related data, the best power management platforms provide a single pane of glass for viewing actual, real-time data center conditions.
Is It The Right Time to Raise a New Question?
Over-provisioning is no longer an affordable option and data center staff resources have been slashed to the bone. No one has time for ongoing manual data collection. It would seem that there has never been a better time to ask this question, as its answer may lead you to take advantage of the latest technological advances.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
Solid comments Jeff about an age old challenge of meeting the cooling and electrical requirements of IT equipment. The question that remains is the headroom requirement for the electrical circuit to reliably and safely power the equipment based on code requirements. The industry should never go into the safety factor of the upstream infrastructure or should it? Will intelligent servers and switches self limit power consumption or use alternate power feeds to proved service with the safe margins of the connected electrical capacity?
You touch on the collaboration requirements between IT and facilities and that is more true these days. Think about the combination of customer demands for server resources and the push of software updates to the systems comprising the cloud. There is a future state soon to come that will allow windows of power consumption for grid supply reasons as well as managing the denomination portion portion of the PUE relationship.
BaldrickPosted June 27th, 2012
As this article states, a de-rating factor is useless as it does not relate to reality, in my experience this is correct. Power spikes on the other hand can be ignored as they are all sub second and just part of the noise. I wonder if you were talking about step loads and not spikes.
Your article also states that power measurement can be subject to workload related errors, but this need not be the case. When all new platforms are commissioned Stress & Volume Testing (SVT) is used to determine how well the platform will perform under worst case conditions. The extent of these stress tests may vary but most are taken well over their design capacity. So if devices are going to produce maximum work load power, it will occur during these tests. The solution is to use rack or server power monitoring (5 to 10 second intervals) As a large platform (multi-rack) is being commissioned all you need to do is monitor then graph the total power consumption for all of these racks. With this data you now have the worst-case power consumption for this platform. For a small DC you do the same but on an individual device basis.
I know this works because I did it.
Good points, Jeff.
There are a lot of dimensions to this topic, of course. Measuring in a way that is beneficial to the way we manage the data center is important. Being able to tie energy costs to “useful work” is another.
Your comments cause me to reflect upon the fact that in our dialogue in the data center industry, when we talk about power densities and power budgets we’re most often viewing the issue from the perspective of power delivered to the IT kit. A more holistic view would be to include all those other elements of the power consumption pie chart (mechanical, electrical distribution, lighting, comms, et. al.) as well as the power delivered to the kit. After all, for whatever reason our motivation for an accurate representation of power to the rack, it is directly linked to power all the associated support infrastructure (and the corresponding efficiencies).
Underwriters’ Laboratories (UL) has come up with a benchmarking regime for understanding servers’ true energy performance, as opposed to the troublesome de-rated information Jeff mentions in the article. The UL 2640 standard comprises a series of standardized tests, including a power-on spike test, a boot cycle test, and a benchmark. The benchmark results determine the server’s power consumption under various loads. It also calculates transactions per watt in seconds, which is valuable for comparing legacy servers with newer models, as well as new models from different vendors. UL 2640 also helps data center operators use actual idle/peak power consumption for allocation of space and power, uncovering significant hidden capacity. There’s a lot more on 2640 at the UL website: