Vince Pelly works for Citihub, a global IT consultancy as an Associate Partner. He has more than 25 years of IT executive experience, having held senior posts in program management, infrastructure and data center design, where he performs data center and network assessments, designs, migrations and consolidations.
The week of October 29th was a testament to the resiliency of data center infrastructure, critical systems and disaster recovery plans. Most notable were the numerous stories heard from the press and customers who ran out of diesel fuel, failed fuel pumps, flooded basements and failure to execute recovery plans in a timely manner. In lower Manhattan, New York’s Con Edison pre-emptively shut down power, forcing data centers on to generator backup.
One interesting note regarding backup generators and the use of diesel fuel to sustain critical systems operations is how The New York Times recently criticized these organizations for environmental issues due to diesel fuel usage during power outages.
Sustaining Critical Operations
This past event confirms that the use of diesel fuel to run backup generators, the location of the critical systems and primary data center as well as the geographic location of backup sites are significant to sustaining the ongoing critical operations for organizations.
The gaps and risks uncovered over the past two weeks demonstrate numerous vulnerabilities and design failures for IT mechanical and electrical systems, business continuity plans and operational procedures. In addition, the locations of these critical systems are within the primary buildings’ basements or lower levels exposing design flaws that prevented these firms from continuing their operations. This doesn’t apply to all firms; some have implemented rigorous backup and fail-over plans, looked ahead and planned around risk accordingly. Others ignored executing backup plans which have resulted in long outages and exposed gaps within their disaster recovery designs.
Best Practices for Being Able to Survive and Execute a Recovery
The following highlights key areas necessary to minimize future outages:
- Organizations should not place critical backup systems and fuel pumps in high risk areas such as basement buildings within flood zones.
- Review regional and local FEMA flood zone maps (U.S.) or the international equivalent to determine the level of risk for your data center and critical systems.
- Conduct regular backup exercises to test fail-over capabilities. Confirm critical power has been tested and generators are functioning with sufficient fuel levels.
- Gain an understanding of fuel delivery schedules and ensure contracts are in place for emergency fuel delivery. Keep in mind that hospitals and emergency facilities have priority for fuel deliveries.
- Backup data centers should be located outside the primary geographic area on separate utility grids, if possible.
- Assess disaster recovery plan designs and architectures. Execute recovery plans early and don’t wait to switch to backup power in the middle of the event.
Going forward, organizations should periodically review their fail-over architectures and incorporate redundancy models via active-active and/or multi-site fail-over designs using multiple data center sites that are located in and out of primary regional power grids.
Multiple site disaster recovery designs provide a higher level of zero data loss at long distances. As an example, implementing a remote asynchronous (out of region) replication model can provide recovery with low data loss when both the primary and local recovery site is impacted (as in the case of a regional outage such as Hurricane Sandy).
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
Note: Reference architectures provided by the Uptime Institute and the Telecommunications Industry Association describe the standards and best practices concerning site selection and the design and location of Critical Equipment Rooms (CER). You should refer to the TIA specifications and best practices when planning and assessing their IT infrastructure and the Uptime Institute for data center certification and standards.