The Ergonomic Data Center: Save Us from Ourselves

2 comments

Chris Crosby is CEO of Compass Datacenters.

chris-crosby-tnCHRIS CROSBY
Compass Datacenters

I recently saw a study on data center failures that found that the vast majority of outages are due to human error. Although it’s nice to have some numbers to back up the assumption, was anybody really surprised? After all, we are only human and, as a result, we make mistakes. Usually not on purpose mind you, but sometimes even our best efforts can result in some degree of mayhem. No one wants to be the one to bring an entire facility to its knees, but obviously the most common explanation in a service interruption post-mortem is usually “who” and not “why”. Since we know that we are our own worst data center enemies, doesn’t it seem like someone would start designing these things to help reduce the margin for “human error”?

While no design may be foolproof, there a few things that you should be looking for to help reduce the likelihood of those “oh s**t” moments that can ruin everyone’s day in the blink of an eye. Providing front access to equipment to make it easier to maintain is something to insist on. Having to service a CRAH should not require the average technician to possess the flexibility of a member of the US woman’s gymnastic team. Those girls are all about 12 years old and four feet tall, the average data center professional is…well, a little older, bigger and vaguely remembers the day he could touch his toes. Thus front access should be standard feature in your data center, for reliability certainly, but also out of pure human compassion.

Data centers by definition are complex environments. Finding and correcting problems within a jungle of conduit that would have forced Stanley to leave Dr. Livingstone to fend for himself is not the best way to ensure efficient maintenance and the quick resolution of issues when they arise. All of your conduit should be color coded and labeled. Not only would it make navigating through the facility a lot easier, it also looks pretty cool. I think we can all agree that anytime you can marry ergonomics and visual appeal, you’ve got a winning combination.

It might also be a nice touch for your data center provider to provide you with a detailed written Operating Procedures and Sequence of Operations and settings before they turn it over. Although you’d think that this would be a given, most data center customers get the equivalent of a couple of paper clipped pages documenting their new facility along with the keys to the joint upon turnover. On the job training and trial and error are both effective tools for learning in the right environment, unfortunately your new data center isn’t one of them. Let’s face it, when your new car comes with more documentation that your multi-million dollar data center, you’ve got a problem.

I guess the fundamental question here is why aren’t data centers designed with their users in mind? If human error is the biggest obstacle to data center reliability then build facilities that minimize that potential. In the near future more customer oriented, ergonomic features that reduce the possibility for human error will undoubtedly become standard requirements, if not for merely for the sake of reliability but to help save us from ourselves.

Don’t Put That There

If you’ve ever watched some of the house hunting shows on TV you know that a lot of homes don’t subscribe to what most of us would feel are the standard rules of design. For example, I’ve seen houses that required their owners to reach bedrooms by passing through others, hallways that can only be navigated by moving sideways, and patios populated by all manner of appliances. Data centers tend to be like some of these homes.

The reasons for this user unfriendliness are pretty straightforward. In multi-tenant environments, the goal of providers is to maximize the area of rentable raised floor. All other considerations thus become tangential, so necessary, but non-revenue producing elements, are located wherever they can be accommodated. For those of you familiar with this rationale, this should help answer the question of “why I need a map to find the POP room?” The rationale for the architectural idiosyncrasies of pre-fabricated solutions—It’s a 12-foot by 40-foot box—provides little solace when you’re carting boxes of new servers through the facility to “your data center.”

Since data centers are far from static environments, and activities like performing moves/adds/changes and unboxing and staging new hardware are regular events, you should insist that your provider use tighter guidelines than “within the same zip code” in locating site features like the loading dock and storage and staging facilities. Unfortunately, “user friendly” doesn’t seem to have made it past the white board stage for most data center designs.

While attributes like a low PUE, and the type of fire suppression system used are certainly important customer considerations, ergonomics is going to become more important to address the increasingly dynamic data center environment. Data centers that facilitate the ease of customer operations are the next logical step within the industry, albeit to the detriment of providers whose architectures reflect their requirements and not their customers.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Add Your Comments

  • (will not be published)

2 Comments

  1. What if, a big WHAT IF, we had all operational metric's (power, excess cooling capacity by room or CRAC, temperature compliance map, AHU efficiency ratings) on a single pane of glass. Provide visibility = reduce human error What if instead of just reporting these metrics we tied them to a real time temperature map of every single REVENUE producing asset on the white space that worked in conjunction with closed loop control allowing us to dynamically adjust cooling output and airflow to mimic IT load + or - 3 degrees that the operator could override at any-point. Dynamically manage airflow=HARD ROI What if we could see how they interact, understand (and document) the exact environmental impact the changes we make have on our facilities at the most critical point. Measure, Model, Manage= Efficiency, +vis +communication Oh yeah! We already do that! 220 projects, 11,000,000 sqft. of mission critical management and growing.

  2. Sean McPherson

    Chris, Excellent points! One other related item is to watch out for cases of things being 'confusingly similar'. People *expect* to find patterns. Even when we insist that the only way to know what breaker you're working on or what device you're about to touch is to check and double check first, people still let their unconscious expectation of a pattern cause them to make mistakes. I'll give an example of a large pair of electrical switchgear panels that was designed to support critical mechanical loads. Although any 1 of 3 pieces of (very efficient, variable speed) HVAC equipment could 'carry' the max heat load, at least 2 was preferred for efficiency and redundancy, and all 3 were run at part load regularly. The A bus and B bus each supported a single-source piece of equipment wired straight into a breaker, and then there was a 3rd matching piece of gear that could swing from side to side via an ATS which was fed from 1 breaker in each bus. In this fashion, even if the 'worst' happened and 2 of the 3 pieces of gear were on a bus at the time of a failure of that bus, the ATS would take care of the swing piece and get it powered back up off the surviving bus, returning the HVAC to N+1 redundancy from a gross standpoint. However, the vendor assembling the bus placed the breakers in such a fashion that in the A bus, the breaker feeding piece #1 was on the top of a section, and the breaker feeding the ATS towards #3 was at the bottom. On Bus B, piece #3 was fed from a breaker on the *top* of the identically placed section, and piece #2 was fed from a breaker on the *bottom* of the gear! You can see that the breakers are inverted. Electrically, this will work fine. If a human is told, in a crisis when 'things are going wrong' to go figure out what is wrong with #3, there're plenty of things they might do wrong and one of them might be to look at the A bus, see there's a breaker for #3 in the bottom of the middle section, then rush to the B bus, see the right sized breaker in that spot, and expect that to be related to #3, when in fact (for no *electrical* reason) it's actually for #2. Even tho both are *labeled* correctly, humans expecting a pattern (single source devices fed from on top of the switchgear, ATS fed devices out of the bottom of the switchgear) would be more likely to operate the wrong breaker, potentially negatively affecting the uptime of a site which is effectively 2(N+1). An amazingly trivial *design* change but one that I've seen in the real world cause issues. EVERY placement decision should be reviewed by someone who *isn't* thinking like an engineer, but instead is trying to perceive the gear from the point of view of the target end user.