Outages in the Cloud: A Learning Experience

Lucas Roh founded Hostway in 1998 and since then has charted the company’s growth to achieve an international presence, Hostway is ranked as one of the top-five Web hosting companies globally.

LUCAS ROH
Hostway

As the newer kid on the block for data storage, cloud services receive significant attention after outages. In spring 2011, Amazon Web Services experienced a widespread outage that disrupted sites such as Reddit, Hoot Suite, Quora and Foursquare. Word of the issue quickly spread, re-opening the discussion about cloud providers' ability to maintain uptime and protect sensitive data.

Outages can directly impact the finances of cloud providers who are consistently looking for new ways to limit the reach and duration of outage events.

Best Practices to Limit Outages

Companies that are already in the cloud and those who are considering cloud solutions should test their own critical systems to be sure their internal architecture can handle failure. Applications should also be tested for their ability to be restored quickly, which helps ensure the companies’ end users experience zero or minimal interruption in service.

Companies should consider randomly introducing failures, so the internal IT team can test their responsiveness under real-world conditions and find the best solutions to mitigate outages. Sharing the results of these stress tests and failures with the cloud provider will help build a more integrated system for managing outages.

Top solution providers are consistently implementing new controls and safeguards that can limit the frequency and severity of outages. For example, they might upgrade their cooling or electrical systems, or introduce new security devices to control breaches. The provider should also have redundant and backup capability to handle any amount of capacity needed after the outage. Their flexibility in scaling up or down to meet demand should also allow them to move data to redundant systems in case of an outage or disaster.

Mistakes Provide Key to Better Service

After an outage, quality service providers will analyze the data to identify any weak processes. They need to find out if hardware, human error, or perhaps internal documentation is to blame for the break in service. Once the root cause is identified, the provider needs to learn and adapt by implementing new procedures and safety checks.

While outages are significant events and should be avoided, customers should not react by moving back to on-premise solutions. Remember that outages occur every day at internal server rooms; they are simply not reported by the blogosphere or media outlets. Neither solution offers true 100% uptime, but the cloud does offer unmatched flexibility and efficiency.

Picking a Partner

In a crowded marketplace, where every provider makes claims about uptime, security and reliability, finding the right partner can be a challenge. Transparency of information is vital. Ask the provider to give detailed information about past outages, including what steps were taken during the outage and what new procedures were subsequently put in place. You want to stay informed, so be sure the provider has steps such as Twitter feeds or auto-emails to let clients know status updates.

Without transparency, then conjecture takes over, and clients can quickly lose confidence in the provider. Top providers will be proactive, both in their dissemination of outage information and their willingness to introduce redundancies to prevent outages.

Moving Past Outages

Despite the risks of outages, for most companies, the reward of lowered costs and greater efficiencies with the cloud is worth it. Going back to an in-house data center requires higher capital costs, maintenance, and in many cases would require hiring back more IT staff. Cloud computing is still in a growing stage, and while failures can be damaging in the short term, they serve a greater purpose by allowing cloud providers to evolve and become more proactive.

Outages will continue to occur. The best approach is accepting the risk and using them as an internal learning tool to be sure your company’s data is backed up and secure. Cloud computing remains the future for business, as the combination of flexibility and lowered costs can simply not be beat.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Comments

Plain text