Lessons Learned from Recent Major Outages

Today’s more interconnected business world makes infrastructure and cloud outages all the more impactful. Here’s a recap of recent outages and their root causes.

Salvatore Salamone, Managing editor

August 1, 2022

2 Min Read

Alamy

In 1988, one broken power line kicked off a series of events that cut off phone service to over 50,000 Chicago-area businesses, hospitals, Chicago's O'Hare and Midway airports, and consumers for more than two weeks. At the time, that event, the Hinsdale Central Office Fire was called the greatest telecommunications disaster ever.

Yet even the impact of the largest pre-Internet/cloud event ever does not compare to what happens on a regular basis these days with cloud outages.

The nature of today’s more interconnected business world makes cloud infrastructure and service disruptions more damaging. In the past, an outage was typically restricted to a small geographical area, and there were relatively easy ways to minimize the impact. For example, a cable cut would disrupt service to those on that one circuit. Many companies would routinely protect themselves by using services from two providers, such as a leased T1 line from one and an ISDN from another. If the primary line was down due to a cable cut, a site could still run core traffic over the lower speed link until service was restored.

Putting an Outage’s Impact into Perspective

CloudFlare, June 2022

The provider suffered a roughly one-hour outage impacting many companies and sites, including Discord, Shopify, Fitbit, and Peloton. Traffic in 19 of CloudFlare’s sites was impacted due to a change to the network configuration in those locations that caused the outage.

Microsoft Azure and M365 Online, June 2022

East coast companies that accessed services via Microsoft’s Virginia data center suffered a 12-hour outage. The cause of the outage, according to Microsoft, was "an unplanned power oscillation in one of our data centers” … “Components of our redundant power system created unexpected electrical transients, which resulted in the Air Handling Units (AHUs) detecting a potential fault, and therefore shutting themselves down pending a manual reset.” Customers with always-available or zone-redundant services in that region were not impacted.

...

Read the full article on our sister site, InformationWeek.

About the Author

Salvatore Salamone

Managing editor, Network Computing

Salvatore Salamone is the managing editor of Network Computing. He has worked as a writer and editor covering business, technology and science; written three business technology books; and served as an editor at IT industry publications including Network World, Byte, Bio-IT World, Data Communications, LAN Times and InternetWeek.

See more from Salvatore Salamone

Related Topics

Recent in Infrastructure

Related Topics

Recent in Build & Design

Related Topics

Recent in Ops & Mgmt

Related Topics

Recent in Business

Related Topics

Recent in Security

Related Topics

Recent in Next-Gen

Related Topics

Recent in Sustainability

Related Topics

Lessons Learned from Recent Major Outages

Putting an Outage’s Impact into Perspective

CloudFlare, June 2022

Microsoft Azure and M365 Online, June 2022

About the Author

Editor's Choice

Industry Voices

Featured Technical Explainers

Related Topics

Recent in Infrastructure

Related Topics

Recent in Build & Design

Related Topics

Recent in Ops & Mgmt

Related Topics

Recent in Business

Related Topics

Recent in Security

Related Topics

Recent in Next-Gen

Related Topics

Recent in Sustainability

Related Topics

<span class="ArticleBase-LargeTitle">Lessons Learned from Recent Major Outages</span>Lessons Learned from Recent Major Outages

Putting an Outage’s Impact into Perspective

CloudFlare, June 2022

Microsoft Azure and M365 Online, June 2022

About the Author

Editor's Choice

Industry Voices

Featured Technical Explainers

Lessons Learned from Recent Major Outages