How A Switch Failure in Utah Took Out Four Big Hosting Providers

How A Switch Failure in Utah Took Out Four Big Hosting Providers

Continued rollup and consolidation in the mass market hosting world, as well as the rise of cloud, means outages at a data center can act as single points of failure for multiple services.

global-data-470

On Friday morning, two network switches failed in a data center near Provo, Utah. As the impact of the failed switches rippled through the facility's network, the downtime spread across four major U.S. web hosting firms, affecting millions of customers.

How could an equipment failure in a single facility knock out four large national brands, including BlueHost, HostGator, HostMonster and JustHost? The simultaneous downtime reflects the ongoing consolidation in the hosting industry, as well as the tendency for large firms to congregate in many of the same data center facilities. It's not a new trend, and can also be seen in cloud computing, where power problems at a single Amazon facility can quickly ripple across popular start-ups and social media sites.

The answer lies in the growth of the Endurance International Group (EIG), which is well known in the hosting industry, but not a household name outside of it. Endurance has grown through a series of acquisitions, as it has pursued a "roll-up" of shared hosting companies. In 2012, Endurance made a huge splash in the industry, acquiring HostGator and its $100 million business, as well as Intuit’s $70 million web hosting business, turning the company into one of the biggest mass market hosting operations in the world. However, EIG's operations remain something of an enigma, as it owns and operates so many brands that there isn’t even a definitive list of its properties.

Endurance owns and operates several big name brands in the market, including A Small Orange, Bluehost, Fatcow, iPowerJustHost. With each acquisition, Endurance has maintained the acquired companies' established brand. This strategy has its advantages, allowing it to target specific markets with specific brands. It also means that as EIG moves these brands onto the same hosting platform, and it means that an outage at the data center can take out several services.

Simultaneous Outages

One result of the consolidation of the shared hosting industry is the convergence of infrastructure into fewer data centers. When those data centers suffer downtime, several brands can be knocked offline. Previously, the hosting landscape was spread across multiple data centers, but roll-up plays such as EIG as well as the emergence of cloud such as AWS (particularly its East region), means that outages are able to be felt by more customers, and could prove to be more fatal.

The Provo, Utah data center outage began Friday morning, and by 5:30 p.m., many sites continued to experience problems. The outage knocked out some of Endurance's most well known brands, including BlueHost and HostGator.

Company Response

Endurance created a dedicated web site to update customers, and an executive also addressed the incident in comments and a statement at The WHIR.

"During routine data center network maintenance, two of our core switches failed,"  Ron LaSalvia, COO, Endurance International,  posted. "This resulted in a significant service disruption for many of our customers, for our own websites, and for our phone systems. Our entire team spent the day diagnosing and repairing the switches and restoring customer sites. At this point, almost every site is back online. We will continue work to ensure that our services are fully restored for every customer, and will do an extensive analysis to improve our network stability."

"Clearly today was not good enough," LaSalvia added. "Nothing bothers me more than when we do not deliver what you expect and deserve."

The chatter around the outage from customers was that Endurance was migrating one of its major 2012 acquisitions, HostGator, from SoftLayer to this Ace Data Center facility in Provo when this happened.

Customer Relations, Communications Need to Be Prioritized

Whatever the cause, outages happen in this industry and they’re terrible for all involved - the company and its customers. In a crisis, the best way for a company to navigate out of an outage is to communicate excessively with customers. A Twitter feed with granular details on the recovery progress goes a long way to preserving transparency and alleviating customer concerns. Even if customers don’t understand the complexity of the language, frequent updates lets them know that you’re working hard to recover.

Endurance has always been somewhat mysterious, and perhaps this doesn’t lend to the best transparency. The company was apologetic, but there seems to be plenty of customer appetite for a detailed post mortem to regain their trust.

The separate branding on the part of Endurance means that customers aren’t aware that many of these properties operate on the same platform. The outage resulted in several major brands and several million websites knocked out, many of whom probably didn’t realize that they were Endurance customers.

Consolidation and roll-up are inevitable in the hosting industry, as competitive pressures come from all angles, including cloud, and even social media such as Facebook, LinkedIn, Google+ or Pinterest. These social networks have taken a lot of personal web page business away from the giants of the hosting world. The focus has shifted in the mass market hosting industry to small business clients, yet these new bread-and-butter customers might not fully understand what’s going on behind the curtain.

As more companies, large and small, use cloud service providers and as the shared hosting industry consolidates to a few major players, the Endurance outage provides a reminder of the need to address single points of failure, and their close relative, the perfect storm of impossible eventsThe Endurance outage hit seemingly disparate hosting brands, amplifying an incident in a single location across multiple services.When an outage happens to Amazon Web Service's East Region, it impacts multiple services using their cloud infrastructure. 

This seems to point to a need for increased transparency and information to the customer about underlying infrastructure. Maybe the old adage of "Let the buyer beware," should be modified to "Let the customer be informed."

TAGS: Design
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish