Skip navigation

The Fundamental Principles of a Data Center Operations Plan

Developing an effective, and adaptable plan for successful operations requires the adoption of specific principles to guide these efforts that require IT to think holistically about their operational goals and how they intend to achieve them.

Jose Ruiz is VP of Compass Datacenters.

In the 1972 movie, The Candidate, Robert Redford plays an idealistic young man who is recruited to run against an incumbent senator. Through the shrewd and calculated efforts of his campaign team the young neophyte emerges victorious and closes the movie by asking his team, “What do we do now?”. Other than, maybe, stirring you to make a mental note to add this to your list of movies to watch on Netflix you’re probably wondering how this relates to the principles of a data center operations plan. The point of intersection between a 45-year old movie and a data center operations plan is just this, in both instances the vast majority of planning and effort is placed on design and development of the structure, a campaign in the former and a data center in the latter, illustrate that what is often sacrificed to the completion of the initial goal is what happens after we’ve achieved it.

While a great deal of attention is paid to the importance of ongoing data center operations it often appears to be the ad hoc marriage between personnel, a few rather large three-ring binders, and in the case of MTDC’s an SLA presiding over the whole affair. As part of the effort to improve the level of your operational planning there are five fundamental principles that should be part of your, or your provider’s plan for ensuring that the first question you ask after taking the keys to your new data center isn’t, “What now?”

Principle 1: Experience is the Best Teacher

Like so many of the important things in life, looking in the mirror and admitting that you’ve made a few operational mistakes in your career is the first step to making these experiences lay the groundwork for understanding what you should, or shouldn’t do, in operating your data center. In some instances, this may mean admitting that you don’t have the personnel to effectively operate the facility or that the proposals from potential providers fall short of your requirements. In any case, past experience supporting mission critical environments leads to the understanding that operational excellence is a comprehensive and ongoing process that reflects the confluence of:

  • Efficient facilities design
  • Effective post-handover and on-going training
  • Putting the right tools into the right hands.

Principle 2: Designed Through the Eyes of an Operator

If you’ve greeted this one with a “Duh”, I encourage you to look at your shelf of binders and ask yourself when was the last time you actually reviewed, or better yet, updated one of them?

Effective operational planning begins in starting with the end in mind, or more simply, “What do you need to be successful?” Although the question may seem to be designed to elicit a simple answer, you’ll find that the answer is typically the compilation of the answers to a number of supporting inquiries.

Obviously, the facility itself needs to be optimized toward to enhance effective maintenance and troubleshooting. In other words, Tier III concurrent maintainability isn’t optional, it’s essential. Your procedures themselves should be simple and straightforward – with an operator’s lens, not an engineers. The Japanese have a term, Poka Yoke, that best describes the ultimate goal in the development of processes and procedures. Put nicely, it means focus on the lowest common denominator to reduce human error. In other words, idiot proof. Let’s face it, when over 70% of outages can still be traced to operator error, we all still have a long way to go on the whole simplification thing.

The on-going nature of operations should be embraced and accommodated. If for no other reason than the average data center goes through a hardware refresh every 3-5 years, data centers are dynamic environments and “we’ve always done it this way” doesn’t exactly address the goal of constant improvement. A feedback loop proves an effective mechanism to eliminate unnecessary steps and identify more efficient methods for performing an operation.

Principle 3: Flexibility and Control

While seemingly broad, this concept is really quite straightforward. The scheduling of activities by your, and especially, provider personnel must align with your operational tempo. Operational requirements must revolve around your specific needs and not the other way around. This same principle also applies to staffing levels for both operations and security.

Principle 4: Training and Certification Program

Nowhere is the goal of continual improvement more applicable than personnel. An increasing level of expertise, and a formal means to gain it, both incents your personnel and increases the overall skill level of the people responsible; ensuring the reliability of your operations.

The method for achieving the goal of organically generating a more confident, competent, and effective maintenance staff requires a role-based training program that is comprised of:

  • A formal curriculum
  • Objective measure of understanding
  • Ongoing processes that are continually updated and refined

The objective goals of such a program should be the development of “subject matter experts” based on escalating levels of certification that are based on:

  • Procedural difficulty
  • Importance
  • Frequency of performance

Principle 5: Focus on Eliminating Errors

Requiring a technician to attempt to diagnose and repair a reported CRAC problem while holding a flashlight in one hand and a three-inch manual in the other one is not conducive to quick and effective resolution, and yet this typifies the standard mode of operation in many existing data centers. Obviously, in this situation, the opportunities for the introduction of human error are myriad.

There an undoubtable any number of ways to accomplish this objective. One way to do it is using a technology solution like Icarus Ops, which converts all procedures into step-by-step digital checklists. Accessible via tablet, Android or Google Glass and including alerts for hazardous steps, access to video, images and documentation for on-the-spot reference, a technician performs an operation following each step and must acknowledge the completion of it before proceeding to the next, thereby dramatically reducing the potential for human error.

Summary

Planning for the operation of your data center is a critical, and often, overlooked element of the new data center process. Effective operational processes and procedures are not the result of rigid adherence to past modes of operation or ad hoc efforts devoid of any underlying guiding foundation. Developing an effective, and adaptable plan for successful operations requires the adoption of specific principles to guide these efforts that require IT to think holistically about their operational goals and how they intend to achieve them.

Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Informa.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish