How The Cloud Changes Disaster Recovery

Mike Klein is president and COO of Online Tech, which provides colocation, managed servers and private cloud services.

Online Tech

Is your company prepared to save critical business data in the event of a disaster? The US government estimates that 1 in 4 businesses won’t survive a disaster, making an IT disaster recovery plan an invaluable investment for any business owner. A decade ago, businesses could open a filing cabinet and easily retrieve paper records if they lost their electronic data. Today, businesses are critically dependent on IT systems without the ability to reference print files, simply because it is an outdated and inefficient way of keeping records. Could you imagine how eBay, Google or an electronic medical records company could operate if they lost their IT infrastructure?

Conventional Disaster Recovery

Conventional disaster recovery (DR) has always been expensive, time consuming and error prone. It typically includes off-site backup, often to tape that was shipped and stored offsite. Businesses would then contract for access to “similar” hardware servers from a cold site disaster recovery vendor on a first-declared, first-served basis. This conventional approach to disaster recovery presents a number of challenges.

  • Recovery delays – Significant delays to the recovery process can be attributed to the retrieval and delivery of off-site backup tapes to the DR data center.
  • Tedious – Each cold site server has to be loaded with the operating systems and patched to the last configuration used in production. Additionally, the application software needs to be installed onto the servers and patched to the last used configuration.
  • Time-consuming – If the patch management records aren’t up to date or available when a disaster strikes, the patches need to be aligned and debugged to match the last production configuration, which can be a lengthy process.
  • Error-prone – Data must be recovered from backup tapes which have failure rates as high as 40% when read from different drives than written. The network also needs to be configured at the cold site to match the network configuration of the production site – including VLANs, VPNs, DNS and firewall rules.

Another problem with conventional disaster recovery is than many plans are written as a one-time fail-over process. The missing key step in most plans is how to return to the production site once it has been re-established. Annual DR testing is another often overlooked element of recovery success. Due to the time-consuming nature of executing a disaster recovery plan, many tests are only partially run and almost never tested through a full fail-over.

Conventional Disaster Recovery Tradeoffs

Disaster recovery alternatives can range from simple tape backup with recovery time measured in days to fully replicated sites with recovery time measured in minutes. Generally speaking, the faster the recovery time, the more expensive the solution, as shown in Figure 1. We often find this is an effective way to explain the cost/benefit trade-offs to a CEO or CFO when proposing an IT disaster recovery project.

Cloud-DR Trade-Offs

Tape backup is a cost-effective first step for disaster recovery, but it can take days or weeks to recover if the hardware needs to be found before the recovery process can begin. On the other hand, disaster recovery to a fully replicated site can provide very fast recovery times, but is much more expensive. Providing both hardware and software at the disaster recovery site as well as a high speed network between sites for data replication can double the cost of the IT infrastructure.

What Changes in the Cloud?

The cloud, specifically virtualization, takes a very different approach to disaster recovery. With virtualization, the entire server, including the operating system, applications, patches and data is encapsulated into a single software bundle or virtual server. This entire virtual server can be copied or backed up to an off-site data center and spun up on a virtual host in a matter of minutes.

Since the virtual server is hardware independent, the operating system, applications, patches and data can be safely and accurately transferred from one data center to a second data center without the burden of reloading each component of the server.

The cloud shifts the disaster recovery trade-off curve to the left, as shown in Figure 2. With cloud computing (as represented by the red arrow), disaster recovery becomes much more cost-effective with significantly faster recovery times.

Cloud Shifts

When introduced with the cost-effectiveness of online backup between data centers, tape backup no longer makes sense in the cloud. In cases where multi-year data archiving is needed for regulatory requirements, tape storage may be helpful. However, the cost-effectiveness and recovery speed of online, offsite backup makes it difficult to justify tape backup.

The cloud makes cold site disaster recovery (as traditionally offered by third parties) look like dinosaurs in the cloud computing world. Warm site disaster recovery becomes a very cost-effective option with cloud computing in which backups of your critical servers can be spun up in minutes on a shared or dedicated host platform.

With SAN-to-SAN replication between sites, hot site DR with very short recovery times also becomes a much more attractive, cost-effective option. One of the most exciting capabilities of disaster recovery in the cloud is the ability to deliver multi-site availability. SAN replication not only provides rapid fail-over to the disaster recovery site, but also the capability to return to the production site when the DR test or disaster event is over. This is a capability that was rarely delivered with conventional DR systems due to the cost and testing challenges.

One of the added benefits of disaster recovery in the cloud is the ability to more finely tune the costs and performance for the DR platform. Applications and servers that are deemed less critical in a disaster can be tuned down with less resources, while simultaneously assuring that the most critical applications get all of the resources they need to keep the business running through the disaster.

The New Critical Path in Disaster Recovery – Networking

With the sea change in disaster recovery delivered by cloud computing, the long straw becomes the network replication. With fast server recovery at an offsite data center, the critical path for a disaster recovery operation is replicating the production network at the DR site including IP address mapping, firewall rules & VLAN configuration.

Smart data center operators are providing full disaster recovery services that not only replicate the servers between data centers, but also replicate the entire network configuration in a way that recovers the network as quickly as the backed up cloud servers.

Disaster Recovery Changes in the Cloud

There are a lot of benefits with cloud computing – cost-effective resource use, rapid provisioning, scalability and elasticity. In my opinion, one of the most significant advantages to cloud computing is the sea change it delivers for disaster recovery. Disaster recovery in the cloud becomes much more cost-effective, lowering the bar for many more enterprises to provide comprehensive DR plans for their entire IT infrastructure. Disaster recovery in the cloud provides faster recovery times and multi-site availability at a fraction of the cost of conventional disaster recovery.

I predict we’re going to hear much more about the changes in DR strategies with the cloud over the next year as more and more enterprises revisit their DR plan in light of the advantages of cloud hosting.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Add Your Comments

  • (will not be published)


  1. I went through a pretty big evaluation of cloud service providers (about a half dozen) about a year ago for a small "cloud" setup. Equipment wise it was 1.5 racks of equipment roughly $650k of gear. There was not 1 cloud provider where the ROI was less than about 8 months. One cloud solution involved a $3 million install fee (yes you read that right, I was surprised too). All of the cloud providers required you pay for the capacity whether or not your using it at that time so it can be reserved. Which in a way makes sense, because if your city burns to the ground and 200 companies all rush to the same cloud providers they quite possibly won't have the capacity. On the flip side you can take the risk an try to do a more dynamic setup and hope your cloud provider(s) have the capacity when disaster strikes. It all comes down to someone has to buy the hardware. That rack and a half of gear was going to be the DR site which had a primary site of roughly 25 racks of gear. Massive consolidation at work. Eventually the project got scrapped due to budget constraints, the "cloud" was a last ditch effort to try to do it for less, but there was no combination of cloud provider that could even come close to doing it ourselves. If your doing your own "cloud" of sorts though I agree virtualization and storage replication technologies have come a long way in recent years making things a lot easier to manage and deploy. The last disaster I went through, it turns out we weren't backing up several important bits of data that we needed. Nobody communicated that these bits needed to be backed up and like most folks lacked resources and capacity to be able to perform any real end-to-end testing on our backups. All we could do is do sample restores (from tape) and see the data that was backed up, was intact (it always was - and for recent backups everything was duplicated to two different tapes just in case).

  2. Nate- Good observations. D/R choices include dedicated and shared environments. Selecting the best fit depends on your company’s risk tolerance. That said, it would be hard to D/R 25 racks into a shared environment. As you point out, testing disaster recovery in the cloud is much easier than testing a cold site recovery scenario off tape backups. We replicate our private clouds between data centers and run a full test including fail over every 6 months. The ability to fully replicate, test and fail back faster and cheaper is a key advantage the cloud brings to disaster recovery.

  3. We are getting a tremendous response/interest from our existing MSP clients and prospective clients regarding the "cloud". That said, I think the biggest misconception is that their "information/applications/intellecutal property" is somewhere out in Cyberspace, which can make them feel a bit uncertain as to where and how it's physcially housed. As a result, there is a lot of education 101 necessary to ease them into the cloud. However, when we reduce it to it's simplest form and use analogies like, how they bank online or how they access investment information, etc., we help them understand that "their" entire company's business now can be accessed the same way - knowing that it's safe and secure on enterprise-class technology; much like a Walmart or B of A would use. This begins to put their minds at ease and bring the entire "cloud" into better perspective. In addition, like Mike mentioned here in this excellent article, the cloud allows businesses to benefit from worry-free and seamless disater recovery situations, eliminating the dynamcis related to tradtional BDR methods that are old, tedious and certainly not guaranteed. The cloud is the ultimate business continuity solution when it comes to ensuring that their business is up and running regardless of power outages, licensing issues, hardware/software failures, hackers and thieves. Over the past 12 months we have done a lot of research in an attempt to uncover the best solution to recommend to our clients. We found very few true H.A., end-to-end business solutions that leaves nothing out and only one that we feel provides our clients with the level of performance, security and scalability that growing companies require and deserve.

  4. Great article Mike, the cloud is something that business owners will have to watch for their disaster recovery plans, the cloud can provide reliable and cost-effective Disaster Recovery operations.