Cloud Services and Outages

Rethinking the Cloud After Recent Microsoft Azure Outages

In a span of just eight days, two different Microsoft Azure outages impacted customers who use Multi-factor Authentication for Azure-based resources. Is this the cost of doing business in the cloud?

How reliable a cloud-based IT operation is can mean a lot to a company's bottom line and employee productivity. That is why the process of considering which services to use and how often they're down is a critical factor when choosing to decide whether or not to go to the cloud -- and which vendor to choose. Recently, two high-profile Microsoft Azure outages relating to multi-factor authentication in an eight-day period negatively impacted customers using Azure-based services such as Office 365, Active Directory, and Dynamics. Situations like this, when spread out across a 12 month period, do not seem bad. However, when they happen in rapid succession, it begins a dialog about our dependence on cloud-based products and services.

Microsoft tracks all of their Microsoft Azure outages and other service issues on their Azure status history page. This is where you will see all updates during the course of an outage and evaluation of the preliminary root cause, how Microsoft mitigated the outage/degradation, and the next steps related to the issue. That last element usually entails a plan of action to prevent a similar occurrence from happening in the future. The companies listing for November has 10 entries, October has 13, and September contains seven. The issues cover a wide range of cloud-based services and are detailed summaries and planned solutions that are quite public with lots of transparency. Amazon has one for their Amazon Web Services as does Google for the Google Cloud Platform.

For a prospective customer considering cloud services, this can be very helpful in evaluating not just how often the company experiences issues but how they respond to them. When you are depending on cloud-based services this type of communications applies to those not only from Microsoft but also Google, Amazon, and other providers in the market. If an IT manager or professional is unable to communicate in a timely manner why certain services are not working, then trust in those same services will be hard to achieve.

All cloud providers have Service Level Agreements (SLAs) that stipulate the company's commitments for service uptime and its connectivity. Microsoft has one for their Azure services, Google has a central one for their cloud platform, and Amazon has a similar centralized page as well. These agreements lay out the uptime guarantees for each service and usually include how the company will compensate customers if that percentage is not met. These are legally binding documents and their review should be part of your decision about moving some or all of your services to the cloud.

The reality of whether to place your trust in fully cloud-based services, a hybrid solution, or solely on-premises is an important one and is not to be taken lightly. The pace of cloud adoption has been blazing and you only need to look at how those services are contributing positively to the bottom line of the companies providing them to understand that. These days it seems the question is not if, but when you will make a move into the cloud for your company.

However, there are choices and it also depends on the level of risk you are willing to take on having access to your services. Outages happen in all three of the scenarios I mentioned above, but when a significant security-related service is impacted twice in a very short period of time it will cause many to pause and consider the ramifications of being purely in the cloud.

I spoke with Jessica Ortega, a web security analyst with SiteLock, about the recent Microsoft Azure outages and she addressed a few different areas. Ortega said, “What is getting glazed over is the seriousness of having Azure MFA go down and what that means for businesses from SMB all the way up to Enterprises who use Azure. This outage meant that they could not access Office 365 which runs their business. You are talking about business email, reporting, and data storage – all of it - being down and completely inaccessible due to an outage. That could cause a business to grind to a halt.”

Relating to this specific Microsoft Azure outage, Ortega said it would be wise for companies to consider whether they should be using a local physical key-based token or cloud-based MFA for securing their accounts. By the way, all three big cloud players (Microsoft, Google, and Amazon) support the use of a physical security key as a second factor of authentication for MFA. Having both options enabled, if possible, would be a good way to mitigate the potential risk of cloud-based outage for MFA on any providers platform.

However, having an alternate administrators account that does not use MFA may not be the best option for avoiding a similar situation in the future. This idea was floated on social media and other channels last week during the first Microsoft Azure MFA outage.

Ortega said, “It is definitely an option, but what you have done is opened yourself up to a potentially vulnerable administrator account that now only has one layer of protection. So yes, you can do that, but it is just a band-aid to get the job done in the event this happens again.”

This is where a business/enterprise must start to consider the cost benefit compared to the risk of using these less secure workarounds.

“Unless you can afford to be down for an entire business day like some were last week and today, if you can’t afford that you have to weigh your vulnerability against the cost of being down for the day. Is the cost of having someone get into that alternate account with nefarious intent less than, greater than, or equal to the cost of lost revenue and productivity while down for a day?” Ortega said.

ITPro Today also asked Ortega about whether businesses, enterprises, and even end users are putting too much trust into these service providers and only being satisfied with 100% availability. Her reply:

“On one hand, if it was any other business, we would say that 99.99999% ad infinitum percent of time is unreasonable, but it is just not how the world works. Even if you have completely on-premises storage, weather is still a thing that exists. We would say that it is unreasonable if it were anybody else, but we have a culture of expectation from these tech giants because they have set that expectation. They have come out with these bold claims and now as a tech culture we assume they are capable of delivering it because they are giant sized, and they have infinite resources that others don’t have.”

This is why cloud took off so quickly: Businesses saw it as a way to use someone else’s resources, allow their business to thrive, and save money in the process.

Ultimately, every process owner in a business or enterprise organization must evaluate all of the risks versus benefits when deciding on a move to the cloud.

Although this story has its origins in the recent Microsoft Azure outages around MFA, similar issues can and have hit the other big names in the industry. Most of the time, they all make the service expectations laid out in their SLAs or provide appropriate compensation for the service being unavailable.

Bottom line: This goes back to the level of risk your organization can absorb without negative impacts if one of the cloud services powering a business were to go down for any period. Making the decision to go cloud or not is not a quick or easy one when all things are considered.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish