Disaster Recovery Lessons from Hurricane Irene
Richard Dolewski is a certified systems integration specialist and disaster recovery planner and Chief Technology Officer and Vice President of Business Continuity Services for WTS.
Can you really know the value of a disaster recovery service before a disaster hits? Irene tells the story.
Every time a disaster occurs—whether a natural disaster or equipment failure or site loss—businesses get a not-so-subtle reminder of how important it is to have a tested disaster recovery plan.
Irene has finished her dance along the eastern coastline of the United States. In her wake, she left many businesses and their employees and families with huge damage and losses. Whether you were in her destructive path or not, Irene’s impact is a reminder to review your plan and make sure your recovery solution includes guaranteed standby equipment, skilled technical resources to perform the recovery, and a facility in an alternate geographic region.
Wind + Water = No Travel
Irene was massive. She crossed 4 FEMA regions and 12 states. The storm covered one-third of the U.S. East Coast, measuring 580 miles wide—nearly the distance from Baltimore to Portland, Maine. The winds were up to 110 mph and Irene’s slow movement north created travel problems as tropical-storm-force winds impacted the Mid-Atlantic coastline for a full day.
Many companies have plans that address their equipment requirements and recovery processes but underestimate the number of staff required to successfully execute their plan. Equipment only works if somebody is able to operate it. With Irene, key personnel were displaced or unavailable due to safety risks or personal priorities. When regional disasters hit, transportation within the area can be difficult and may result in your staff being unable to reach their assigned locations. Equipment may be accessible, but it will be ineffective if your staff cannot access the recovery site.
Having a recovery location in an alternate geographic region with local support staff is the only sustainable solution for large regional disasters.
The Human Element
In a disaster, you can’t predict whether your key personnel will be available to assist with the recovery. So companies need a fail-safe process that can be followed by any IT professional.
First, it is a good idea to test your disaster recovery plan during normal business operations, using a group of assigned individuals. Too many companies, especially those that perform recovery tests with no more than their data center staff, count on IT heroics to pull them out of a crisis. Expecting IT to perform a miracle in an outage is difficult, often impossible, for your staff. This scenario is avoidable today. When full a recovery can be tested before an outage, without impacting your production users, it can likely be performed successfully without heroics in the real disaster scenario. The success or failure of such a test will be a good indicator of your corporate disaster readiness.
Second, it is important to have detailed and precise documentation, especially when your disaster recovery plan includes cross departmental staffing. Companies should create recovery documentation so that anyone in the organization’s IT, or IT services provider, can start a recovery. In a well-tested plan, an IT professional from another company should be able to start the recovery in the event employees from your IT staff are not available.
When to Declare
Can you put your recovery team “On Alert” that a disaster is possible (or probable), to signal that someone should prepare your systems in case the outage actually occurs?
If you work with a third party provider, you must actually “declare” an emergency to access equipment and facilities. Declaring a disaster means your systems are down and you need immediate support, triggering staff to perform key steps in your disaster recovery plan. If you use a third party provider, a declaration fee and additional daily charges could be assessed to your organization. Check the fine print of your disaster recovery agreement.
If your DR plan includes an in-house alternate site recovery, your alternate location may be another office, warehouse or manufacturing facility to stage the recovery. Remember how broad Irene’s emergency situation was and how many states it covered. If a Boston company had an alternate location 200 miles away, the recovery was likely impacted by Irene at both sites. By the time you put your staff on alert, even your alternate locations may be at risk.
Managed service providers allow “On Alert” status from clients who anticipate issues with their production systems from wind, water or lack of electrical power. Technical teams are immediately put into action—recalling tapes from offsite storage, validating replicated data at the data center, verifying systems availability for all production critical servers, and checking network connectivity to these systems. These providers may also work with you to ensure you know how to manage approximate run times for your own production battery backups and generators, how to request emergency fuel delivery, and how best to manage alternate power solutions for an extended period of time. Your plan and managed provider should allow for a start to the recovery in advance of the pending disaster.
In the event of a disaster like Irene, customers of managed service providers have system availability within hours of declaring. In addition, company staffs don’t need to travel to a recovery site.
Evaluating Your Readiness
If your organization does not have a disaster recovery plan, you are putting the business at great risk for downtime caused by water, wind or loss of power. In this time when the frequency and intensity of natural disasters seems to be increasing, that’s a risk most businesses cannot afford to take. If the staff responsible for recovery is evacuated away from the disaster and returned to their homes after the event, it will likely be several days before the business can begin a substantive recovery.
Post Irene, ask yourself the following:
- Does our organization have a documented disaster recovery plan?
- Does the recovery site have equipment for future recovery needs?
- Does the standby equipment match our current production configuration?
- Have we successfully tested in the past year?
- What staff will perform the full recovery in the event of a disaster, and do we have the proper documentation to allow other staff to step in?
- Is our recovery data center in a geographic diverse location?
Now that Irene has passed, every IT department should look for lessons learned from this actual disaster scenario. Your organization can benefit from the experiences gained by other IT staff and their exposure to a devastating hurricane. If you were actually hit by Irene or felt the pressure of this real disaster, your organization may have witnessed the value of a solid disaster recovery plan. If not, you may want to look into managed disaster recovery, so you can experience systems availability within minutes of the actual “declaration” when your time comes.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
I actually love your first point, which is testing. A lot of companies already have the best technologies and processes in place, yet they forget to do one essential thing: determine if they do really work. Of course, assessments don’t really create the actual and real picture later on. But they help identify weak and strong points of their disaster recovery program, then take steps to reinforce them.