Gary Bunyan is Global DCIM Solutions Specialist at iTRACS Corporation, a Data Center Infrastructure Management (DCIM) company. This is the 10th in a series of columns by Gary about “the user experience.” See Gary’s previous columns on Turning DCIM’s Big Data into Actionable Insight and Unlock Your Capacity By Unplugging Your Ghost Servers.
With more and more computing resources being deployed in denser and denser data center spaces, it’s no wonder data center professionals are focused on cooling. The data center is now at the epicenter of today’s hottest businesses, essential to their competitive positioning and market success. But no matter how dense the data center gets, its servers, blades, and other IT assets must be kept cool.
Cooling is an expensive necessity in many data centers. Customers – like the one in the Middle East I was just visiting – are always looking to provide just the right amount of it.
What’s the “right” amount of cooling? It’s the perfect balance between under- and over-cooling in a consistent flow across all of your assets. Not enough cooling and the servers are at risk. Too much cooling and you’re spending more on energy than you need. Uneven cooling and you get thermal inconsistencies – areas either too hot or too cold. The right amount of cooling – and you’re running the most efficient infrastructure possible, minimizing both your cooling costs and your risk.
The trick is to use a DCIM solution that offers you live data about thermal conditions at the individual server level, and sends you real-time alerts if a threshold is in potential of being reached. Instead of guessing about server temperatures, DCIM gives you actual inlet and outlet readings at the device level in your dashboards. Armed with this knowledge, you can run a “lean machine” that minimizes cooling while keeping your assets safe. This tightrope can only be walked safely if you have real-time information at your fingertips. Guesswork is way too dangerous.
Beware of Thermal Hot Spots
Using a DCIM tool with visualization lets you identify, manage and resolve potential hot spot issues on your floor before they turn into problems. The key is to be proactive, not reactive. Here's how it works:
Forensics: Identifying the Source of the Hot Spot
(1) A thermal hot-spot alert goes off in your DCIM environment. If you’re using a visualization tool, you can instantly see the problem area highlighted in red. The alert is being fed by real-time data from any number of sources depending on your environment – Intel DCM, Power Assure, RF Code, or other data feeds. With a few clicks, you interrogate the alert and learn that rising temperatures in the top U positions of 3 racks are about to go critical.
(2) You run forensics at the device level, looking at a live data feed of inlet and outlet temperatures from each affected server. You confirm that server inlet/outlet temperatures have exceeded thresholds.
(3) With a few clicks – still within the DCIM system – you review maintenance schedules and reports. You learn that a CRAC unit is still offline, past its scheduled repair window.
(4) With a few more clicks, you interrogate the servers under threat and identify which business unit owns them – you confirm they are revenue-generating applications with direct impact on profitability. So you must take action immediately.
Resolution: Migrating the Applications to a Safer Environment
(5) Using DCIM’s predictive what-if scenarios, you quickly determine where you can move the applications running on the endangered servers – you need to migrate them to cooler IT assets with the appropriate power and connectivity.
(6) You confirm your migration strategy in the safety of the DCIM software and give your technicians a clear set of instructions – complete with automatically-generated 3-D diagrams – so they know exactly what to do.
(7) You confirm with the business unit and secure approval to move the applications, then dispatch your technical teams to execute the move.
(8) Once you’ve confirmed that the CRAC maintenance has been completed and the CRAC unit is back online, you migrate the applications back to the original servers.
The Bottom Line – Uninterrupted Revenue for the Business
Identifying and resolving hot spot issues is relatively easy when you have the right DCIM tools. And the benefits are quantifiable:
- Optimum business continuity – the organization’s revenue-generating applications continue operating with no impact on customer service or revenue streams.
- Uninterrupted service levels – you continue to meet your SLAs.
- Mitigating risk associated with maintenance – you use the incident to improve maintenance scheduling and minimize potential future risk to operations.
See Gary’s previous columns on Turning DCIM’s Big Data into Actionable Insight and Unlock Your Capacity By Unplugging Your Ghost Servers.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.