How A Switch Failure in Utah Took Out Four Big Hosting Providers
August 5th, 2013 By: Jason Verge
On Friday morning, two network switches failed in a data center near Provo, Utah. As the impact of the failed switches rippled through the facility’s network, the downtime spread across four major U.S. web hosting firms, affecting millions of customers.
How could an equipment failure in a single facility knock out four large national brands, including BlueHost, HostGator, HostMonster and JustHost? The simultaneous downtime reflects the ongoing consolidation in the hosting industry, as well as the tendency for large firms to congregate in many of the same data center facilities. It’s not a new trend, and can also be seen in cloud computing, where power problems at a single Amazon facility can quickly ripple across popular start-ups and social media sites.
The answer lies in the growth of the Endurance International Group (EIG), which is well known in the hosting industry, but not a household name outside of it. Endurance has grown through a series of acquisitions, as it has pursued a “roll-up” of shared hosting companies. In 2012, Endurance made a huge splash in the industry, acquiring HostGator and its $100 million business, as well as Intuit’s $70 million web hosting business, turning the company into one of the biggest mass market hosting operations in the world. However, EIG’s operations remain something of an enigma, as it owns and operates so many brands that there isn’t even a definitive list of its properties.
Endurance owns and operates several big name brands in the market, including A Small Orange, Bluehost, Fatcow, iPower, JustHost. With each acquisition, Endurance has maintained the acquired companies’ established brand. This strategy has its advantages, allowing it to target specific markets with specific brands. It also means that as EIG moves these brands onto the same hosting platform, and it means that an outage at the data center can take out several services.
One result of the consolidation of the shared hosting industry is the convergence of infrastructure into fewer data centers. When those data centers suffer downtime, several brands can be knocked offline. Previously, the hosting landscape was spread across multiple data centers, but roll-up plays such as EIG as well as the emergence of cloud such as AWS (particularly its East region), means that outages are able to be felt by more customers, and could prove to be more fatal.
The Provo, Utah data center outage began Friday morning, and by 5:30 p.m., many sites continued to experience problems. The outage knocked out some of Endurance’s most well known brands, including BlueHost and HostGator.
“During routine data center network maintenance, two of our core switches failed,” Ron LaSalvia, COO, Endurance International, posted. ”This resulted in a significant service disruption for many of our customers, for our own websites, and for our phone systems. Our entire team spent the day diagnosing and repairing the switches and restoring customer sites. At this point, almost every site is back online. We will continue work to ensure that our services are fully restored for every customer, and will do an extensive analysis to improve our network stability.”
“Clearly today was not good enough,” LaSalvia added. “Nothing bothers me more than when we do not deliver what you expect and deserve.”
The chatter around the outage from customers was that Endurance was migrating one of its major 2012 acquisitions, HostGator, from SoftLayer to this Ace Data Center facility in Provo when this happened.
Customer Relations, Communications Need to Be Prioritized
Whatever the cause, outages happen in this industry and they’re terrible for all involved – the company and its customers. In a crisis, the best way for a company to navigate out of an outage is to communicate excessively with customers. A Twitter feed with granular details on the recovery progress goes a long way to preserving transparency and alleviating customer concerns. Even if customers don’t understand the complexity of the language, frequent updates lets them know that you’re working hard to recover.
Endurance has always been somewhat mysterious, and perhaps this doesn’t lend to the best transparency. The company was apologetic, but there seems to be plenty of customer appetite for a detailed post mortem to regain their trust.
The separate branding on the part of Endurance means that customers aren’t aware that many of these properties operate on the same platform. The outage resulted in several major brands and several million websites knocked out, many of whom probably didn’t realize that they were Endurance customers.
Consolidation and roll-up are inevitable in the hosting industry, as competitive pressures come from all angles, including cloud, and even social media such as Facebook, LinkedIn, Google+ or Pinterest. These social networks have taken a lot of personal web page business away from the giants of the hosting world. The focus has shifted in the mass market hosting industry to small business clients, yet these new bread-and-butter customers might not fully understand what’s going on behind the curtain.
As more companies, large and small, use cloud service providers and as the shared hosting industry consolidates to a few major players, the Endurance outage provides a reminder of the need to address single points of failure, and their close relative, the perfect storm of impossible events. The Endurance outage hit seemingly disparate hosting brands, amplifying an incident in a single location across multiple services.When an outage happens to Amazon Web Service’s East Region, it impacts multiple services using their cloud infrastructure.
This seems to point to a need for increased transparency and information to the customer about underlying infrastructure. Maybe the old adage of “Let the buyer beware,” should be modified to “Let the customer be informed.”
Another example of the importance of having a monitoring system reside in a separate location, so that in the unfortunate event of an outage, at least you’ll be aware of it. Here’s a related post on things to think about so you can recover from such an event and be sure that everything is indeed back up and running: http://blog.logicmonitor.com/2012/07/02/when-lightning-strikes-your-cloud-good-monitoring-means-great-disaster-recovery/
Great article. I am writing a piece about this very thing, and I will be linking to your article. Many of my clients I had referred to HostGator went down during this outage. I am a reseller with Hostgator, and it was not until this happened that I learned Brent Oxley had sold HG to EIG, whom I had previous terrible experience with when they bought my former hosting co before I came to HG. After much research and a test run, I found site5.com, who is on softlayer in Dallas, where my old HG account still reside until they move them, ( any time now). That is where I will be moving my 200 plus clients and I have taken all HG ads down and will never refer to them again. Because it is not HG anymore it is EIG. Period. It is time to go to the next level in educating people on how hosting works and that the data center is the heart of your website, and that it is important. To have all those companies on that one location is nuts.
Also I have heard bad things about ACE data center.