Mike Klein is president and COO of Online Tech, which provides colocation, managed servers and private cloud services.
Is your company prepared to save critical business data in the event of a disaster? The US government estimates that 1 in 4 businesses won’t survive a disaster, making an IT disaster recovery plan an invaluable investment for any business owner. A decade ago, businesses could open a filing cabinet and easily retrieve paper records if they lost their electronic data. Today, businesses are critically dependent on IT systems without the ability to reference print files, simply because it is an outdated and inefficient way of keeping records. Could you imagine how eBay, Google or an electronic medical records company could operate if they lost their IT infrastructure?
Conventional Disaster Recovery
Conventional disaster recovery (DR) has always been expensive, time consuming and error prone. It typically includes off-site backup, often to tape that was shipped and stored offsite. Businesses would then contract for access to “similar” hardware servers from a cold site disaster recovery vendor on a first-declared, first-served basis. This conventional approach to disaster recovery presents a number of challenges.
- Recovery delays - Significant delays to the recovery process can be attributed to the retrieval and delivery of off-site backup tapes to the DR data center.
- Tedious - Each cold site server has to be loaded with the operating systems and patched to the last configuration used in production. Additionally, the application software needs to be installed onto the servers and patched to the last used configuration.
- Time-consuming - If the patch management records aren’t up to date or available when a disaster strikes, the patches need to be aligned and debugged to match the last production configuration, which can be a lengthy process.
- Error-prone - Data must be recovered from backup tapes which have failure rates as high as 40% when read from different drives than written. The network also needs to be configured at the cold site to match the network configuration of the production site – including VLANs, VPNs, DNS and firewall rules.
Another problem with conventional disaster recovery is than many plans are written as a one-time fail-over process. The missing key step in most plans is how to return to the production site once it has been re-established. Annual DR testing is another often overlooked element of recovery success. Due to the time-consuming nature of executing a disaster recovery plan, many tests are only partially run and almost never tested through a full fail-over.
Conventional Disaster Recovery Tradeoffs
Disaster recovery alternatives can range from simple tape backup with recovery time measured in days to fully replicated sites with recovery time measured in minutes. Generally speaking, the faster the recovery time, the more expensive the solution, as shown in Figure 1. We often find this is an effective way to explain the cost/benefit trade-offs to a CEO or CFO when proposing an IT disaster recovery project.
Tape backup is a cost-effective first step for disaster recovery, but it can take days or weeks to recover if the hardware needs to be found before the recovery process can begin. On the other hand, disaster recovery to a fully replicated site can provide very fast recovery times, but is much more expensive. Providing both hardware and software at the disaster recovery site as well as a high speed network between sites for data replication can double the cost of the IT infrastructure.
What Changes in the Cloud?
The cloud, specifically virtualization, takes a very different approach to disaster recovery. With virtualization, the entire server, including the operating system, applications, patches and data is encapsulated into a single software bundle or virtual server. This entire virtual server can be copied or backed up to an off-site data center and spun up on a virtual host in a matter of minutes.
Since the virtual server is hardware independent, the operating system, applications, patches and data can be safely and accurately transferred from one data center to a second data center without the burden of reloading each component of the server.
The cloud shifts the disaster recovery trade-off curve to the left, as shown in Figure 2. With cloud computing (as represented by the red arrow), disaster recovery becomes much more cost-effective with significantly faster recovery times.
When introduced with the cost-effectiveness of online backup between data centers, tape backup no longer makes sense in the cloud. In cases where multi-year data archiving is needed for regulatory requirements, tape storage may be helpful. However, the cost-effectiveness and recovery speed of online, offsite backup makes it difficult to justify tape backup.
The cloud makes cold site disaster recovery (as traditionally offered by third parties) look like dinosaurs in the cloud computing world. Warm site disaster recovery becomes a very cost-effective option with cloud computing in which backups of your critical servers can be spun up in minutes on a shared or dedicated host platform.
With SAN-to-SAN replication between sites, hot site DR with very short recovery times also becomes a much more attractive, cost-effective option. One of the most exciting capabilities of disaster recovery in the cloud is the ability to deliver multi-site availability. SAN replication not only provides rapid fail-over to the disaster recovery site, but also the capability to return to the production site when the DR test or disaster event is over. This is a capability that was rarely delivered with conventional DR systems due to the cost and testing challenges.
One of the added benefits of disaster recovery in the cloud is the ability to more finely tune the costs and performance for the DR platform. Applications and servers that are deemed less critical in a disaster can be tuned down with less resources, while simultaneously assuring that the most critical applications get all of the resources they need to keep the business running through the disaster.
The New Critical Path in Disaster Recovery – Networking
With the sea change in disaster recovery delivered by cloud computing, the long straw becomes the network replication. With fast server recovery at an offsite data center, the critical path for a disaster recovery operation is replicating the production network at the DR site including IP address mapping, firewall rules & VLAN configuration.
Smart data center operators are providing full disaster recovery services that not only replicate the servers between data centers, but also replicate the entire network configuration in a way that recovers the network as quickly as the backed up cloud servers.
Disaster Recovery Changes in the Cloud
There are a lot of benefits with cloud computing – cost-effective resource use, rapid provisioning, scalability and elasticity. In my opinion, one of the most significant advantages to cloud computing is the sea change it delivers for disaster recovery. Disaster recovery in the cloud becomes much more cost-effective, lowering the bar for many more enterprises to provide comprehensive DR plans for their entire IT infrastructure. Disaster recovery in the cloud provides faster recovery times and multi-site availability at a fraction of the cost of conventional disaster recovery.
I predict we’re going to hear much more about the changes in DR strategies with the cloud over the next year as more and more enterprises revisit their DR plan in light of the advantages of cloud hosting.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.