7 Attributes That Help Counter Data Center Downtime

2 comments

Peter Panfil is Vice President Global Power Sales, Emerson Network Power. With more than 30 years of experience in embedded controls and power, he leads global market and product development for Emerson’s Liebert AC Power business.

Peter PanfilPETER PANFIL
Emerson’s Liebert AC Power

As computing demands and complexity in the data center continue to rise, unplanned data center outages remain a significant threat to organizations in terms of business disruption, lost revenue and damaged reputation.

A recently-completed survey of U.S.-based data center professionals from the Ponemon Institute and sponsored by Emerson Network Power, shows that an overwhelming majority of respondents have experienced an unplanned data center outage in the past 24 months (91 percent). Regarding the frequency of outages, respondents experienced an average of two complete data center outages during the past two years. Partial outages, or those limited to certain racks, occurred six times in the same timeframe.

However, there is a bright spot to the survey. The results do show that many companies are more aware of the causes of downtime and taking steps to minimize the risk. In fact, the survey took a closer look at those high-performing data centers that experienced the least amount of downtime and identified seven common attitudes and actions largely shared by the organizations.

Not every data center will be able to adopt all seven of these attributes. But even implementing a few of them might greatly decrease the frequency of unplanned downtime and mitigate its impact.

1. Consider data center availability your No. 1 priority – even above minimizing costs.
Given tightening budgets, this might be one of the hardest attitudes for many organizations to adopt. However, with the increase in reliance on IT systems to support business-critical applications, a single downtime event now has the potential to significantly impact the profitability of an enterprise. In fact, for enterprises with revenue models that depend on the data center’s ability to deliver IT and networking services to customers, downtime can be particularly costly.

2. Utilize best practices in data center design and redundancy to maximize availability
It all comes down to the fundamentals. There are a number of proven best practices that serve as a good foundation for data center design and redundancy. These best practices represent proven approaches to employing cooling, power and management technologies in the quest to improve overall data center performance. They include everything from matching cooling capacity and airflow to IT load, to utilizing local design and service expertise to extend equipment life.

3. Dedicate ample resources to recovery in case of an unplanned outage
This is more than having enough people to be able to reset breakers and cycle the power on servers following an outage. It involves having site preparedness – food, lodging, alternate transportation – for personnel in the event the outage is the result of a natural disaster. Hurricane Sandy taught us that having enough generator fuel on hand, and an established supply chain for replenishment that could stretch into days was critical to some facilities staying up.

4. Have complete support from senior management on efforts to prevent & manage unplanned outages
The Ponemon Institute survey exposes a difference in perception that often exists between senior management and those reporting to them when it comes to downtime. Forty-eight percent of senior-level survey respondents had greater confidence that leadership is supportive of efforts to prevent outages. While 71 percent of supervisor and below respondents believe their organization has made sacrifices to availability to improve efficiency or reduce costs inside their data center. Supervisor and below respondents were also more likely than senior management to believe that unplanned outages happen frequently. This disparity shows the importance for frank discussions about unplanned outages and the level of support and investment needed to prevent and manage the incidences.

5. Regularly test generators and switchgear to ensure emergency power in case of utility outage
The most rigorous form of this testing is commonly referred to as “pull the plug.” This sort of routine testing is mandated to meet local codes for some industries, such as healthcare. It confirms the proper operation during a utility outage of the automatic transition from utility to battery to generator and back. It keeps the facility team updated in their training should an unplanned outage occur. This also confirms for the facility management team that the data center will ride through a utility outage without incident, and gives them time in a controlled manner to harden any deficiencies.

6. Regularly test or monitor UPS batteries
Having a dedicated battery monitoring system is a must. According to Emerson Network Power’s Liebert Services business, battery failure is the leading cause of UPS system loss of power. Utilizing a predictive battery monitoring method can provide early notification of potential battery failure. The best practice is to implement a monitoring system that connects to and tracks the health of each battery within a string.

7. Implement data center infrastructure management (DCIM)
It is important to ensure the foundation for effective management of the data center is in place in the form of an up-to-date visual model of the facility and centralized monitoring of infrastructure systems. This will likely include the deployment of a DCIM platform capable of providing a holistic view of data center operations based on real-time data that spans facilities and IT systems.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Add Your Comments

  • (will not be published)

2 Comments

  1. Carlos Nieves

    I would like to add an eighth attribute: 8- Know and understand your environment. There is no substitute for having the knowledge and expertise needed to think on your feet during a crisis.

  2. I think these are great points and don't have an issue with any of them. I'd like to offer up a higher order of thinking however for business leaders. Availability starts with the application. I would encourage CEOs, COOs and LOB leaders to work with their CIO on each main business app. Let the application factors drive the level of infrastructure and availability needed. Never slam everything into expensive floor space and then wrestle with DR or parallel sites. Availability analysis starts by analyzing each app and if it warrants your own floor space then points 1-7 apply. http://www.RunThisProject.com