Can You Achieve Zero-Downtime Cutover with Data Migrations?

Wayne Lam is CEO of Cirrus Data.

When migrating data for enterprises, not having to upset the 24x7 operation in any way is the most important requirement. All of these systems are only allowed short, annually planned downtimes for maintenance. Data migration involves moving data from an old storage system to a new storage system, then switching all the application servers from the old to the new.

In most cases the two storage systems are different, both in make and model, and certainly in firmware versions. With advanced migration appliances, insertions can be made into the live environment transparently without any disruption, then data is migrated from the old storage to the new. The migration process can be set to have minimal impact to the ongoing I/O by yielding to production traffic intelligently. Up until the point the application server cutover, zero disruption can be achieved.

Cutting over to the new storage is another story. Here's a review of exactly what is involved for one typical server cutover.

When a host is switched from one storage to another, even if the LUNs have the exact data, it is a whole new set of LUNs for the OS. There is no mechanism to just “switch” from one LUN to another. For SANs, the closest possibility is to use the multipath to fool the system by pretending the new storage is just another set of paths to the original LUNs. With such impersonation of the identity of the original LUNs, the new LUNs can be treated as new paths of the original LUN. Therefore, the multipath driver can just switch to the new set of paths dynamically, whereby the original LUNs are now replaced by the new LUNs.

In theory, this is zero-downtime cutover. Mission accomplished! As someone once said: “In theory, there is no difference between practice and theory. In practice, there is.”

To do what the above described, there will be so much required for the new storage to add, it is doubtful any storage vendor will have the stomach to even contemplate it. For impersonating another storage, the inquiry string, vendor critical data pages, and many mode pages need to be dynamically changeable. In addition, the ALUA information has to be aggregated and emulated perfectly. Not to mention the path configuration, which must be able to preserve and support SCSI reservation for cluster operation. And then, think about all the different OS’s for all the hosts. Furthermore, all these operations are to be conducted by third-party migration appliances via standard APIs provided.

It is probably easier to get the United Nations to agree on some peace plan for the world than to have all the vendors support such an initiative. One also has to remember, this dynamic transfer of state is to be done for every LUN, with every I/O at that moment. If any I/O is not handled properly, data corruption will be its fate.

People tend to be unaware that zero-downtime cutover is generally not a permanent requirement. Meaning, even the most critical operation has scheduled downtime for maintenance. Most often than not, people want to migrate and cutover because they need to remove the old storage, either due to a problem, an end of lease, or a requirement for better performance of the new storage.

An approach that is both theoretically sound and practically feasible is to enhance the migration appliance to be able to sustain the operation, using the new storage it migrated to, and allowing the old storage to be removed. This way, the user can control the timing of cutting over to the new storage, and at the same time, wait for the annual scheduled maintenance window to remove the appliances.

Of course, all the technical details mentioned above still need to be handled, but it is the migration appliance that is implementing all the necessary functions, and not requiring all the new storage to handle that. There is no required cooperation from anyone, as long as their storage meets the overall standard specifications, which they must, otherwise it would not be working to start with. This is a lot more realistic. All the burden is now on the single party – people who build the migration appliances.

This is still a huge technical challenge. Zero-downtime data migration, from installation to cutover, is what you should expect to see in next-generation data migration appliances.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Comments

Plain text