Through DCIM software organizations can measure, control, and optimize data center operations, reduce risk and costs, as well as improve efficiency. Putting in place data center infrastructure management programs is on the wish list of every data center manager. However, for many, purchasing DCIM software may not be in the budget. Yet even on a shoestring budget significant steps can be made to optimize data center operations without capital expense.
In the first installment of this two-part series, we explored the “measuring” aspect of DCIM equation. Specifically, we discussed methods for collecting asset intelligence, benchmarking efficiency and leveraging existing monitoring. In this final installment, we’ll explore the final elements of the equation: control and optimization. Specifically, how to use the knowledge gained through measurement to control and optimize data center operations.
Read Part I here: Data Center Optimization Without Breaking the Bank
Getting Things Under Control
Without effective operations processes a data center manager will be surprised each time they walk through the facility. They’ll constantly encounter messes, cables installed like a plate of spaghetti, and brand new devices haphazardly racked in cabinets. Rarely will anyone own up to these messes. Running data centers this way undermines all efforts to implement DCIM because there’s no way to maintain accurate data. Before any-size DCIM program can be implemented (even on a shoestring budget), several processes need to be in place.
Managing Access. The most important control mechanism is to restrict access to the data center. At the very least, most data centers have badge readers on data center doors. Badge access to data centers should be shut off for the majority of employees. Only the few that physically run the data center should be allowed in, such as facilities staff responsible for data center power and cooling and data center technicians responsible for racking and cabling. Other IT functions such as activating ports or configuring servers can be performed remotely. If on rare occasions other IT staff truly need physical access to IT equipment, their access should be granted via a request ticket, and they should also be escorted by a data center technician. Restricting badge access will dramatically improve best practices and keep data accurate.
Recording Changes. Another important control mechanism is recording ALL changes to the data center through change, incident, and service request tickets. Tickets provide awareness and communication to all stakeholders and generate important audit trails. While most IT departments already have some form of ticket system and processes in place, it’s critical that the process also include each data center move, add, and change as well. The benefits far outweigh the added red tape. Accountability is instantaneous as both requester’s and request fulfiller’s names are tied to each ticket. Additionally, changes in the data center can be reviewed/approved to ensure changes occurring simultaneously won’t conflict and thorough planning and risk mitigation can be undertaken at the same time – reducing errors and increasing uptime.
Policies, processes, and procedures. Assuming data center access has been limited to only facilities and data center technicians, the “3 P’s” should be in place (policies, processes, and procedures). Having policies, processes, and procedures followed will increase service quality, reduce human error (a major cause of data center outages), and provide consistent results essential in maintaining DCIM. Facilities and data center technicians should be equally involved in creating the policies, processes, and procedures, and strictly follow them once in place.
- Policies are the rules such as “badges must be worn at all times, no food, drinks or combustibles allowed,” etc. Once policies are created, leadership needs to support and help enforce these rules. Typically, the best enforcement is by tying compliance to performance reviews and threaten to terminate if infractions continue to occur.
- Processes are high-level, step-by-step workflows for all activities affecting the data center. Having these steps documented will reveal who (including the customer) is responsible for each sub-task, and when they need to do their work.
- Procedures are the detailed instructions for doing specific tasks. Procedures serve as great training manuals to ensure new hires learn how to perform work the right way on day-one. Even seasoned veterans will find procedures useful. When certain tasks are performed infrequently, key “how to” details can be easily forgotten. These details are quickly found in a good set of procedures.
Starting the Optimization Journey
Once all data center assets are tracked, efficiency benchmarks are established and the “3Ps” are in place, it’s time to look for ways to optimize the data center. This is an ongoing process that needs periodical attention; optimization is a never-ending journey that continues to improve data center operations over time.
Decommissioning idle IT equipment. One very effective way to optimize operations is by decommissioning idle IT equipment. Removing idle equipment frees up all data center capacity; space, power, cooling, ports and weight. Idle IT equipment draws a flat amount of power (which can be seen on power strip outlet monitoring) – which can add up quickly when it comes to overall power consumption and cooling needs. Many servers, IT appliances and network gear have their own monitoring points that include power consumption that can help identify the idle equipment and often IT systems administrators can provide data for equipment they manage. Another sign that equipment is idle is that their ports may not be lit up green and blinking with flowing data. Instead, ports may be amber, or dark and/or have no cables connected.
Connecting to “owners.” Within a data center, there are often “zombie servers” lurking within your cabinets – equipment that no longer serves a purpose but still is drawing power and taking up space. Often these zombies have been forgotten about and aren’t tied to any specific department. To identify the IT equipment that isn’t serving a purpose, these simplest (and least expensive) way is to add an “Owner” field onto your inventory spreadsheet and then find out which team “owns” each piece of equipment. Then, meet with each owner and review their list of equipment and identify any devices that are no longer needed. Not only do these meetings eliminate extraneous hardware, but they also send a clear message that greater scrutiny is now in place to ensure only equipment deemed essential can reside in the data center.
Cleaning up the cabinet. In addition to removing unnecessary IT equipment, all cables that are either disconnected at one end, or plugged into dark ports should be removed. Having unwanted cables out of the way also makes cable tracing much easier and thus reduces time spent troubleshooting connectivity issues. Additionally, removing unnecessary cables increases airflow which improves cooling performance and saves energy. Fewer cables also means less weight on floors and cable trays.
Power redundancy. With data showing kW estimates or actual power draw through monitoring, imbalances in loads between cabinets are revealed. A best practice is to provide each cabinet with both A and B power for redundancy. The total load of redundant circuits combined should not exceed 80% of maximum capacity for one of the circuits. For example, 4,492 watts is 80% of max for a cabinet with A and B 30A 208V circuits. Therefore, the total load for that entire cabinet should not exceed 4,492 watts. That way, if one of the circuits fails, the second circuit can carry the total load. Plans should be made to relocate IT equipment that push cabinets over their redundant capacity to ensure critical IT infrastructure stays up and running during a circuit failure.
Reducing Hot Spots. Cabinet power data can also be used to ensure cooling is in balance. Temperature readings can be taken at the cold aisle in front of cabinets where power loads are heavily concentrated. If hot spots are discovered, discussions with the equipment owners should be held to plan for relocating some of their equipment to more lightly loaded areas of the data center.
By decommissioning all idle devices, removing dead cable and rebalancing power and cooling loads, valuable data center resources can be recouped, more U space within cabinets will also be available, less power will be consumed and weight of equipment and cables will be reduced. Also, once hot spots are eliminated, considerations can be made to turn up the data center temperature; raising temperatures just a degree or two higher can equate to significant energy savings.
The greatest advantage of any DCIM effort is the ability to accurately forecast the impact of future data center projects. Even on a shoestring budget, inventory spreadsheets can help with forecasting. These spreadsheets can include reservation of space and wattage for all known future projects. Armed with wattage estimates per device, maximum load capacities per cabinet, and quantities of available ports, forecasts of how much data center resources future projects will consume or free up can then be calculated. This empowers data center managers to accurately calculate if new hardware will fit in the data center, determine optimal locations, and predict impact to data center resources (space, power, cooling and ports).
When it comes to DCIM, you don’t have to “break the bank” in order to make significant improvements to your data center’s operations. By working through the three aspects of the DCIM equation – measurement, control and optimization – your organization will likely realize immediate reductions in risk and costs while improving overall efficiency. It’s okay to start small.
About the Authors:
Tim Kittila is Parallel Technologies’ Director of Data Center Strategy. Before joining Parallel Technologies in 2010, he was vice president at Hypertect, a data center infrastructure company. Kittila earned his bachelor of science in mechanical engineering from Virginia Tech and holds a master’s degree in business from the University of Delaware’s Lerner School of Business.
Nathaniel Josephs is Senior Data Center Manager at Parallel Technologies. He joined the company with more than 25 years of experience in data center development and management. Over the course of his career, Josephs has managed multiple data centers totaling nearly 70,000 square feet. Josephs has an undergraduate degree in Business Information Systems from Bellevue University in Omaha, Nebraska.