Sherman Ikemoto is General Manager, Future Facilities North America, which is a supplier of data center design and modeling software and services.
Data center capacity is the amount of IT equipment that is intended to be loaded in the data center and is typically expressed in terms of kW/sqft or kW/cabinet. This specification is derived from projections from the business units for the amount of computing capacity required over the long term.
But most data centers never achieve the capacity for which they were designed. This is a financial challenge for owner/operators because of the partial return on the capital expenditure and the need to raise additional capital to add capacity years sooner than originally planned. The cost of this lost capacity dwarfs all other financial considerations for a data center owner/operator.
Often 30 percent or more of data center capacity is lost in operation. On a global scale, out of 15.5 GW of available data center capacity, a minimum of 4.65 GW is unusable. At industry averages, this amounts to about 31 million square feet of wasted data center floor space and $70B of unrealized capital expense in the data center. The losses are staggering.
Given the stakes, why isn’t much being said about lost capacity? Because these losses are due to fragmentation of infrastructure resources – space, power, cooling and networking – that build slowly and imperceptively early in the data center life span. As resources fragment, the data center becomes less and less able to support the full, intended IT load. Only well into the operational life of the facility, when the margin on capacity has closed, is the problem discovered. Lack of visibility and the delay between cause and detection conceal the elephant in the room: Lost Capacity.
Compute Capacity Fragmentation
Fragmentation occurs when the actual IT configuration build-out differs physically from the design assumptions used to design the facility. For example, assume during the design of a data center, standard server hardware was assumed to be the installed IT equipment form factor. But, due to changing requirements, blade server form factor was selected and installed instead. It may be true that the power draw of a blade server might be the same as the standard server hardware, but the space and cooling (airflow) utilizations could be substantially different. These differences were not accounted for in the design of the infrastructure. As a result, space, power and/or cooling fragment and data center capacity is reduced.
To better understand fragmentation, consider a computer hard drive. Given that you pay per unit of storage, your goal is to fully utilize the capacity you have before buying more. However, hard drive capacity will fragment incrementally as you load and delete programs and files. The amount of fragmentation that occurs depends on how the hard drive is used. Eventually, a point is reached at which the remaining available capacity is too fragmented to be of use. Only with defragmentation tools can you reclaim what has been lost and fully realize your investment in the device.
The concept of resource fragmentation also applies to the data center. Data center capacity, like hard drive storage capacity, will fragment through use, at a rate that depends on how it is used. The simple answer to fully realizing the capacity potential of the data center is continuous defragmentation. This, however, is where the similarities between hard drives and data centers end.
The first difference is that hard drive capacity is defined by space only while data center capacity is defined by the combination of space, power, cooling and networking. This makes defragmentation of data center capacity significantly more complicated as it requires coordinated management of four data center resources that are traditionally managed independently.
The second difference is that unlike hard drive capacity, data center capacity cannot be tracked by traditional means. This is because cooling – a component of capacity – is dependent on airflow that is invisible and impractical to monitor with sensors. Cooling problems therefore can be addressed only if airflow is made “visible.” A simulation technique called computational fluid dynamics (CFD) is the only way to make airflow visible. Therefore, the only means to defragment data center capacity that has been affected by cooling problems is through the use of CFD simulation.
The final difference is that unlike the hard drive, defragmentation of data center capacity is often not an option because it puts IT service availability at risk. Therefore, to protect against data center capacity loss, fragmentation issues must predicted and addressed before IT deployments are physically implemented.
These differences have a significant impact on the techniques required to protect against data center capacity fragmentation.
Just Because You Don't See It, Doesn't Mean It's Not There
At first glance, capacity management seems straightforward. Early in the data center lifecycle, cabinets loaded with IT equipment are positioned, and power and space utilization are tracked with spreadsheets and DCIM tools. Unfortunately, as IT configuration approaches 50 percent of full load, fragmentation problems often materialize. Hot spots and power distribution problems form and even when best practices have been followed and when (on paper) plenty of space, power and cooling capacity remains.
Hot spots are caused by an IT configuration that has evolved differently from the original design. This can completely change the airflow paths within the data center and inside of the cabinets. The result is a fragmented supply of cooling.
Given that airflow is not easily monitored and is fluid, cooling fragmentation is invisible and often severe. Without the ability to visualize cooling/airflow, data center capacity protection is exceptionally difficult. The end result is a data center that is stuck at perhaps sixty percent of full capacity, with no obvious path to full IT deployment.
You Can't Manage What You Can't Predict
The ability to simulate and visualize fragmentation before it occurs is essential to protecting data center capacity. This predictive capability makes it easier to find a balance between short-term drives and the long-term goal of fully utilizing the intended capacity. Simulation must be used not only during the design phase of a data center, but also on a recurring basis as IT modifications are made. A data center “design” will change many times during a facility’s lifespan, so it is critical to verify continuously that the original capacity specifications are still being met.
Modern simulation techniques can fulfill these requirements by integrating live data, data from DCIM systems and the latest IT configuration roadmap information as input to predict the impact of proposed changes before they are implemented. In effect, data collection and the simulation model combined provide operational validation of a continuously changing data center design. The focus of validation in this form is the most critical data center performance metrics: IT service availability and long-term data center capacity.
- Compute capacity utilization increased to above 90 percent from the typical 60-70 percent.
- IT service availability increased by addressing redundancy problems and environmental risks - Before changes are implemented in the data center.
- Operational decisions made that extend the lifespan of the data center.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.