For some Verizon Cloud customers this coming weekend’s prolonged downtime will be a resiliency test for their architecture; some will simply experience a period of downtime they knew could come at some point; yet for some, the outage may serve as a wakeup call.
Last week, Verizon notified users of the new Verizon Cloud services that all nodes of the cloud will undergo maintenance, beginning early morning Saturday, 10 January. The provider told users to be prepared for up to 48 hours of downtime, although the process is anticipated to take less time than that, Verizon Enterprise Solutions spokesman Kevin King said in an email.
Unlike September’s unplanned security update to the XenServer hypervisor that necessitated widespread cloud reboots by numerous providers, including among others Amazon Web Services, Rackspace, and IBM SoftLayer, Verizon is planning to take its cloud down for routine maintenance.
The company announced the new cloud services, built on an entirely new platform, in 2013. This weekend’s maintenance will not affect customers of the company’s “legacy” cloud platforms, which include enterprise, managed, and federal cloud services.
“The upgrade will involve installation of routine maintenance updates to Verizon Cloud production platforms at Culpeper, Virginia, and other data center locations supporting Verizon Cloud,” King said. “Updates of this nature typically require some system downtime, and we notified customers in advance, so they could plan accordingly.”
Unusually Long Maintenance Window
While planned cloud outages for maintenance are common, what’s uncommon about this one is its duration and scope.
Bill Peldzus, vice president of consultancy Cloud Technology Partners, said even though Verizon did not anticipate the maintenance to take the full 48 hours, the worse-case scenario would make for an unusually long outage. “Something that takes you down for up to two days is fairly significant,” he said.
Customers that have built the capabilities to failover to another cloud provider into their architecture will simply get to see how well their resiliency schema works when Verizon takes its cloud down. For customers who don’t have a disaster recovery schema in place but have critical applications that need to stay up around the clock, “this could be their wakeup call,” Peldzus said.
Failover Options May Be Limited
Big cloud providers generally have their own failover systems in place that enable customers to move their workloads from one data center to another during maintenance periods. Peldzus said it was uncommon for a provider to simply notify users that their entire cloud would not be available during a maintenance window without providing some place to transfer users’ workloads temporarily.
The option to failover within Verizon Cloud may not be available to customers, however, since all nodes of the cloud are being upgraded. King did not respond to a request to clarify this in time for publication.
Understanding Cloud SLAs is Paramount
It is likely that Verizon Cloud has some users who simply accept a potential period of downtime as part of their agreement with the provider. “It is up to the cloud user to understand the SLA of their cloud provider and design [or] plan accordingly,” Peter Roosakos, principal at Foghorn Consulting, another cloud consultancy, said in an email.
If you have an application that needs to be up around the clock, and your cloud provider practices “rolling maintenance” (taking individual zones offline one by one), then you can serve your high-availability workload from zones within that same cloud that are not offline. If your provider doesn’t offer such operations model, you have to use multiple providers for your critical applications, Roosakos explained.
“As you’re implementing your applications on extremal cloud providers, you should always plan for and architect for failure,” Peldzus said.
While the multi-provider architecture is the best option for mission critical applications, managing such an infrastructure is significantly more complex. Ultimately, the nature of the business and the application should dictate how much thought each company puts into architecting for resiliency.