The world is littered with thousands of examples of the problems associated with data center strategy mistakes around capacity and performance.
For example, Lady Gaga fans brought down the vast server resources of Amazon.com soon after her album “Born This Way” was offered online for only 99 cents. Similarly, a deluge of online shoppers caused the data center to crash after they bombarded Target.com for a mammoth sales event. And, of course, there was the famous healthcare.gov debacle, when an ad campaign prompted millions of Americans to rush to the website for healthcare coverage only to face long virtual lines and endless error messages. In total, it is estimated that more than 40,000 people at any one time were forced to sit in virtual waiting rooms as available capacity had been exceeded.
Each of these examples highlights why data center managers have to make sure their data center strategy stays ahead of organization expansion needs as well as watching out for sudden peak requirements that have the potential to overwhelm current systems. The way to achieve that is via data center capacity planning.
“When organizations lose sight of what is happening or what might happen in their environment, performance problems and capacity shortfalls can arise, which can result in the loss of revenue, reduced productivity, and an unacceptable customer experience,” says John Miecielica, former product marketing manager at capacity management vendor TeamQuest, now a consultant for Stratagem, Inc.
“Data center managers need to ensure that business capacity, service capacity and component and resource capacity meet current and future business requirements in a cost-effective manner. This has everything to do with managing and optimizing the performance of your infrastructure, applications and business services.”
If It Ain’t Broke ...
The old saying, “If it ain’t broke, don’t fix it,” might be a workable principle in many different scenarios. When it comes to data center strategy for capacity, however, it can be a deadly philosophy as the above examples illustrate.
One European data center, says Miecielica, implemented capacity planning to transition from only being able to fix things when they broke to being able to right-size its virtual environment based on accurate capacity forecasts. Result: That organization avoided infrastructure costs totaling $65,000 per month. Further, its ability to pinpoint bottlenecks helped it eliminate hundreds of underperforming virtual machines (VMs).
Users tell a similar story. Enterprise Holdings, Inc. (EHI), corporate parent of Enterprise Rent-A-Car, Alamo Car Rent A Car, National Car Rental and Enterprise CarShare, is the largest car rental service provider in the world. In the past, forecasting and modeling of data center capacity was done via manually collected data that was typed into Microsoft Excel and Access. As well as being resource-intensive and error prone, it also tended to be inaccurate. This was something EHI could ill-afford in a competitive marketplace. Slow systems could mean hundreds of car rentals being lost within a few minutes, as well as delays in getting vehicles to the places they were needed the most, leading to low customer satisfaction ratings.
“Dozens of resources and countless hours were consumed in data collection, guesstimating growth and presenting a forecast on a quarterly and annual basis,” says Clyde Sconce, former IT systems architect at EHI.
His company had been guilty of a common data center strategy mistake—over simplification of demand. One example was the practice of creating a forecast by taking current CPU usage and then using a linear trend to predict all future requirements.
“If you do it that way, you will be mostly wrong,” says Sconce.
EHI implemented TeamQuest Surveyor to streamline forecasting, automate the process and heighten accuracy. This made it possible for forecasts and reports to be made available and updated weekly and daily if necessary. That enabled the data center to move out of reactive mode, understand changes as they happened and take action to ensure its systems never suffered from a Lady Gaga-like event.
Capacity forecast inputs were obtained from Surveyor, and combined with a variety of business metrics and data gathered from a collection of Java tools. This was then translated into projections for CPU and business growth, dollar cost per server, forecasts relevant to different lines of business and executives, and even ways to check the accuracy of earlier forecasts.
The point here is not to try to predict the future based on one or two metrics. Instead, EHI extracted a wide range of parameters from a variety of sources that includes database information such as server configuration (current and historical), resources consumed (CPU, memory, storage) and business transactions (via user agents). Specific to its UNIX AIX environment, metrics like rPerf (relative performance) helped the data center to understand whether it needed to add or remove CPUs to improve performance.
Sconce cautioned data center managers to watch out for exceptions that can trip up forecasting when working on data center strategy. Take the case of historical data being incomplete or non-existent for a new server. That can result in an anomaly such as a fairly new server being forecast as having 300 percent growth.
“We go in and override numbers like that in our forecasts, correcting them to a known growth rate for servers that house similar applications,” says Sconce. “Bad data, too, needs to be removed, and you have to watch out for baseline jumps such as shifts in resource consumption without changes in growth rates.”
An example of the latter might be where two servers are merged into one. In that case, the workload has doubled but the growth rate has not changed. But the biggest lesson, says Sconce, is to align data forecasting to current as well as historical business transactions as that ultimately represents the whole point of the exercise: how the business is currently driving the resources being consumed in the data center, and how business or market shifts might overhaul internal resource requirements.
The most important statistic at EHI is the number of cars rented per hour. Therefore, instead of feeding executives incomprehensible technical metrics, Sconce always translates them into how they relate to the cars per hour statistic to facilitate better understanding with management. Being able to achieve this, he says, requires close contact with business heads to accurately correlate business transactions to resources consumed in the data center and to then create a realistic estimate of their cost to the organization.
“Throwing all your data and inputs into a blender won’t work very well,” says Sconce. “An accurate forecast must employ a sophisticated analytical tool that can do things like cyclical trending, anomaly removal, baseline shifts, hardware changes, cost correlations and flexible report groupings.”
The values EHI relies on the most are peak hourly averages at the server level. The organization has also found it useful to have exception reports generated to flag servers with missing data or anomalies that need to be investigated.
One final tip from Sconce: Base data center capacity forecasts on both cyclical growth as well as linear projections. EHI calculates annual growth but applies a cyclical pattern to that forecast based on monthly usage. This approach to data center strategy accounts for potential leaps in demand due to seasonal peaks, or campaign launches. A linear projection, for example, may show that a purchase should be made in June, but cyclical data highlights where surges in business usage may occur. This allowed EHI to defer capital expenditures or speed up purchases based on actual business needs instead of just projecting usage forward as an orderly progression.
“By implementing capacity planning in this way, we dramatically reduced our resource time commitments; we were able to automate the forecasting process and implement daily/weekly reporting,” says Sconce. “TeamQuest Surveyor enabled us to develop a standardized forecasting strategy and to conduct historical forecast tracking to identify areas of improvement.”
Data Center Complexity
While capacity planning has always been important, its star has risen in the era of virtualization, cloud computing, BYOD, mobility and Big Data. To cope with this, Gartner analyst Will Cappelli says capacity planning needs to be supported by predictive analytics technology.
“Infrastructures are much more modular, distributed and dynamic,” he says. “It is virtually impossible to use traditional capacity planning to effectively ensure that the right resources are available at the right time.”
This entails being able to crunch vast amounts of data points, inputs and metrics in order to analyze them, quantify the probabilities of various events and predict the likelihood that certain occurrences will happen in the future. Therefore, data center managers are advised to lean toward capacity planning tools that enable them to conduct that analysis in such a way that they can run a variety of “what if” scenarios. This allows them to determine their precise requirements, thereby reducing both cost and risk.
Miecielica agrees. He says that the challenge for organizations is to understand how they can slice and dice all of the data coursing through the data center and the organization. By compartmentalizing all this data into actionable information, capacity planners can share this in the form of a dashboard with metrics that the business can understand and use to make strategic business decisions.
However, the need to solve the issue of future capacity requirements is urgent. Bernd Harzog, CEO of OpsDataStore, says that conversations with enterprise users confirm that the typical data center server operates at 12 percent to 18 percent of capacity. This number is borne out by an extensive data center survey completed by a company known as Anthesis Consulting Group in a report entitled, “Data Center Efficiency Assessment.”
“The standard method for adding capacity is to use resource utilization thresholds as triggers to purchase more hardware, but this results in excess hardware purchases as it does not factor in the requirements of the workloads (the applications) running on the infrastructure,” says Harzog. “The trick is to be able to drive up utilization without risking application response time and throughput issues.”
One possible way to minimize the complexity inherent in the modern data center is via the creation of dashboards. The data center manager from a large telecom company, for example, recently implemented capacity management with goals set for cost reduction, risk avoidance and efficiency.
“The project leader focused on dashboards first, and the visibility of the project changed in a dramatic way leading to the capacity management project team becoming in demand,” says Bill Berutti, president of cloud management, performance and availability and data center automation at BMC.
Previously within this telecom data center, various storage, server and operations managers had periodic meetings to decide where to spend money in the data center. The first dashboard produced by BMC for the storage team provided actual usage numbers that led to about 40TB of storage being eliminated from a purchasing contract.
As organizations strive to curtail data center costs, the first places they are likely to snip are planning and management tools such as capacity planning. Yet that one little red line in the expense budget could result in millions overspent on hardware, software or networking.
“Most organizations are underinvested in capacity management, both as a process discipline and also in the tools required to support the process,” says Ian Head, an analyst at Gartner.