What's Causing Cloud Outages? A Network Managers' Guide

From fat-finger errors to fishing boats, here are the leading reasons cloud outages at AWS, Microsoft, and others are a growing network resilience challenge.

Salvatore Salamone, Managing editor

August 7, 2023

2 Min Read

Alamy

As enterprises rely more and more on cloud services to meet their network infrastructure, compute, data storage, and security needs, cloud computing outages have a significant impact on operations.

Many believe (or hope?) that moving services to the cloud would eliminate some issues. After all, you would assume cloud providers make use of the latest technologies, have staff with expertise in these technologies, and build in lots of redundancy.

Unfortunately, what we find is that cloud outages have a lot in common with their data center outage counterparts. Many occur due to human error, power outages, malicious acts, Mother Nature, or plain bad luck.

What's Causing Cloud Outages?

There are several common culprits causing cloud outages. Over the last few years, we have seen examples of each. All have had a significant impact on the enterprises using the services. Here are some of the top problems that keep reoccurring.

Configuration mistakes

We're in the age of graphical user interfaces (GUIs) and automation. Yet, many critical IT chores like deploying a new server, provisioning storage for an application, or setting up new router tables are done manually via command line interfaces (CLIs). As one would expect, that can lead to configuration mistakes.

That is often the case with cloud outages. One such mistake caused a six-hour outage of Facebook, Instagram, Messenger, Whatsapp, and OculusVR due to a routing protocol configuration issue. As we wrote at that time: "The outage was the result of a misconfiguration of Facebook's server computers, preventing external computers and mobile devices from connecting to the Domain Name System (DNS) and finding Facebook, Instagram, and Whatsapp."

Essentially, BGP routers were unrecognized, preventing traffic destined for Facebook networks from being routed properly. Resolution of the problem was more challenging than normal because not only was communication between routers interrupted, but so too, were DNS traffic and all applications.

The problem here was that everything ran over the same network. As a result, IT staff could not remotely correct the problem because they could not access the impacted systems. And making matters worse, IT staff were locked out of facilities because their access control system also ran over the same network.

Read the rest of this article on Network Computing.

About the Author

Salvatore Salamone

Managing editor, Network Computing

Salvatore Salamone is the managing editor of Network Computing. He has worked as a writer and editor covering business, technology and science; written three business technology books; and served as an editor at IT industry publications including Network World, Byte, Bio-IT World, Data Communications, LAN Times and InternetWeek.

See more from Salvatore Salamone

Related Topics

Recent in Infrastructure

Related Topics

Recent in Build & Design

Related Topics

Recent in Ops & Mgmt

Related Topics

Recent in Business

Related Topics

Recent in Security

Related Topics

Recent in Next-Gen

Related Topics

Recent in Sustainability

Related Topics

What's Causing Cloud Outages? A Network Managers' Guide

What's Causing Cloud Outages?

Configuration mistakes

About the Author

Editor's Choice

Industry Voices

Featured Technical Explainers

Related Topics

Recent in Infrastructure

Related Topics

Recent in Build & Design

Related Topics

Recent in Ops & Mgmt

Related Topics

Recent in Business

Related Topics

Recent in Security

Related Topics

Recent in Next-Gen

Related Topics

Recent in Sustainability

Related Topics

<span class="ArticleBase-LargeTitle">What's Causing Cloud Outages? A Network Managers' Guide</span>What's Causing Cloud Outages? A Network Managers' Guide

What's Causing Cloud Outages?

Configuration mistakes

About the Author

Editor's Choice

Industry Voices

Featured Technical Explainers

What's Causing Cloud Outages? A Network Managers' Guide