Proactive vs. Reactive: Assessing Risks in the Data Center
October 9th, 2013 By: Jason Verge
ORLANDO, Fla. - David Boston of Boston Consulting knows a lot of about how to evaluate data center risk. At one point, he managed more than 100,000 square feet of space for GTE Data, a company that was later acquired by Verizon. But it was touring and assessing risks for hundreds of data centers as a consultant that honed his eye and he discovered that each data center is unique, no matter how uniform a company chooses to build them. All data centers have strengths and weaknesses. The most important assessment objective is to reveal risks that were not previously evident.
At Data Center World last week, he spoke about why data centers should perform risk assessments – not only on a reactive basis, but a proactive one. Currently, 90 percent of risk assessments are reactive, and the audience—reluctantly—revealed this to be true. “Pain generates immediate concern,” said Boston.
“No matter how well the data center is planned, we still find single points of failure in almost all assessments,” he said. While internal assessments can be effective in identifying some problems, “they tend to be effective in identifying what they know what to look for,” said Boston. “Most have seen far fewer facilities in issues over their careers. Personal visits to several facilities add to best practices.”
Previously, Boston served as Program Director – Site Uptime Network, at Uptime Institute. On average, he says that data centers still see an average of one downtime event per year. Downtime is overwhelmingly caused by human error, accounting for 60-80 percent of the time. Boston stresses that processes are just as critical as equipment, and that often, you cannot see problem points when it’s your own data center.
Those who do choose to proactively assess generally schedule reviews every 3-5 years.
There are three major reasons to seek out a risk assessment
- Validate the function of infrastructure system capacities
- Identify single points of failure
- Determine which systems fail to match design objectives.
The benefits of an assessment extend beyond just prevention. They confirm that systems are being utilized as expected. A report will consist of an executive summary, recommended actions, as well as a list of commendable best practices. They will often pick up on site risks that aren’t immediately evident, such as effective color coding or unprotected breaker handles, as two small, given examples. Boston also gives an example of something he once found that wasn’t immediately evident to the data center operator. The generators were specified without the option to change filters while in operation, which could have caused serious problems during extended downtime.
Boston provided a checklist of objectives they look for that extend beyond just systems, such as who the responsibility falls on at each point. “There’s a greater need for cooperation in the data center and defined expectations,” he added.
He recommended using a list of what to ask for with an outside risk assessment. “The schedule will vary based on what’s on the checklist,” said Boston. “Most facilities require on site review that lasts 2-4 days, with a subsequent report produced in 2-3 weeks.” These time frames depend on facility size and complexity.
“How much time it takes to recover is more important than how long you’re down,” Boston noted. Risk assessments help to have a master place in place “to get the most bang for your buck.” Outside risk assessments help identify what data center operators might not be seeing right in front of them – they have more experience viewing a variety of facilities and often, we can’t see problems right in front of us when we’re not a few steps back from the situation.
As the consultant emphasized, there’s greater need for cooperation and defined expectations in the data center. Different people are in charge of different systems, and outside parties provide a holistic view of the data center.
Excellent perspective by Jason Verge and David Boston on the importance of the process of periodic risk assessment. A refreshing contrast to the herd, increasingly obsessed with this tier rating or that tier certification or concurrent maintainability, etc.