What Intel’s IT Team Has Learned about DCIM
December 16th, 2013 By: Industry Perspectives
Jeff Klaus is the general manager of Data Center Manager (DCM) Solutions at Intel Corporation. Jeff leads a global team that is pioneering power- and thermal-management middleware, which is sold through an ecosystem of data center infrastructure management (DCIM) software companies and OEMs.JEFF KLAUS
Wasn’t 2013 supposed to be “the big year” for adoption of DCIM solutions? Despite analysts’ predictions, many legacy data center managers still have a “wait and see” attitude about data center power and management platforms and practices. And in fact, whenever we initiate talks about our Intel Data Center Manager (DCM) solution, the data center architects eventually have raised the same question:
“What DCIM solution(s) does your (Intel) IT organization use, and what results have they obtained?”
A few years ago, IT@Intel published a white paper about the in-house evaluation of Intel Power Node Manager and Intel DCM as part of a company-wide DCIM planning process. Since then, our IT organization continues to serve as a rigorous test bed for the latest power and management technologies and practices.
I recently checked in with the Intel IT team supporting our company’s data centers in the Europe, Middle East, and Africa (EMEA) region. Last year, the team carried out a local evaluation of the latest Intel DCM solutions for managing power and cooling, and this Q&A with Ofer Lior, Intel IT data centers manager, and Paul Vaccaro, Intel IT data center operations and planning, shares the lessons they learned. The team worked closely with the product team, and their findings helped drive enhancements in the latest versions that have been released since their proof of concept (POC).
Q: What steps did you take to evaluate Intel DCM?
Based on our typical POC guidelines, we started with a very small number of devices (10 servers) to check the capabilities and usability of the tool without putting our production environment at risk. Since then we approved a larger deployment in additional data centers in the EMEA region. As of today, Intel DCM is deployed in eight data centers in the EMEA region and is monitoring more than 3,500 devices. We are working to increase these numbers as we refresh our legacy servers that lack power-reporting capabilities.
Q: Which features did you find to be most useful?
We found that Intel DCM can give our data center operation managers some of the more important DCIM capabilities, with relatively short deployment times. In particular, the solution provides:
- Server power characteristics. To plan new server landings in the data centers, we used the Intel DCM “Server Power Characteristics” function to learn actual server power consumption for specific server models. Prior to this, power consumption was either estimated or set to the manufacturer’s specified maximum value, either of which could result in over-shooting our requirements because these numbers are typically very conservative and worst-case. With actual data for a large sample of same-model servers, it was easy to analyze actual server power consumption and accurately plan future capacity. We could see what the room would support. We found this to be one of the most powerful tools from Intel DCM in our environment.
- Cooling analysis. Intel DCM can monitor a large number of temperature sensors. We were able to gather cold aisle temperatures by reading server front-panel temperature-sensors and then use the cooling analysis function to get a good idea of our cooling efficiency. We used this data to make improvements such as moving loads to underutilized areas of the room. In one case, it helped us tune the mechanical cooling, airflow, and set points. Using the “Energy Optimization-Cooling Analysis” function, we were able to identify hotspots. Alerts from several servers in the same row indicated a cooling issue in that area. Further investigation indicated that malfunctioning temperature sensors in a specific server model caused this, and the data was shared with the server vendor for issue resolution.
- Ghost and underutilized servers discovery. We have been able to assess a large number of devices monitored by Intel DCM by using the “Server Utilization Analysis” function. This capability allows us to recognize ghost and underutilized servers (servers installed in the environment and working in very low utilization or in idle modes). These physical devices have shown to be good candidates for migration into virtual servers in the environment and have shown a potential for both power saving and reuse of non-utilized assets. We have used these inputs from the DCM tool to challenge their existence.
Did you discover any unexpected or unanticipated benefits? What were they?
One unexpected benefit we found stemmed from the fast deployment. When we first started the project, we were not sure how much effort would be involved. It took two weeks to gather the data (server identification, configuration details, locations, and IP addresses), but the data upload and installation of Intel DCM middleware took only a few hours.
When we started validating the results, we found that some systems were not reporting power and temperature. We also found that some servers lacked full communication with Intel DCM. An additional three days were spent scrubbing data inaccuracies and troubleshooting communication issues. The identified problems were then addressed and resolved within two days.
Overall, the technical side of the implementation took less than one month. Note: this did not include all the necessary approvals and documenting POC results.
What advice would you offer your data center peers considering DCIM tools for their data centers?
Deploying any tool across multiple facilities requires careful consideration of the costs-and-benefits analysis. Consideration should be made regarding the terms of the required capabilities, the desired level of integration with existing platforms, the resources required to deploy these tools, and the costs to sustain the tools.
If you lack sufficient power and cooling monitoring, then we recommend starting small. Introduce any one of the available monitoring capabilities to increase awareness for the need for and value of monitoring. Once you have been able to see the value gained from the introduction of the first capability, then go to the next phase.
Intel DCM offered us significant capabilities for managing data center power consumption with a minimal amount of investment and integration. Long term, we believe this capability will be an important part of our overall DCIM plans. We also believe that using the IT equipment as the data collectors and monitor points will provide the most cost-effective and efficient means for the instrumentation of our data center facilities. We will continue to leverage advances in hardware Power and Thermal Aware Sensor (PTAS) and DCIM software capabilities as key components of our plans. We believe others can benefit from a similar approach, and minimize their investment of precious OPEX dollars and resources.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.