Fires. Floods. Power problems. Software updates gone bad. Thermal events. There was a wide range of causes for data center downtime in 2013. The year's major outages covered the spectrum, affecting clouds, companies, payment networks and governments at the federal, state and local level.
Each incident caused pain for customers and end users, but also offered the opportunity to learn lessons that will make data centers and applications more reliable. Here’s a look at our list of the Top 10 outages of 2013:
1. The Healthcare.gov Disaster: Downtime doesn't get much more epic than this. The federal government's online insurance marketplace has become the poster child for an IT project gone wrong. It wasn't just a matter of a single downtime incident, it was a series of hard outages and an ongoing soft outage in which the site was barely functional. They tried adding more hardware, but it wasn't until the Obama administration's "IT surge" addressed software and data bottlenecks that the site became usable in early December. Given the status of the Affordable Care Act as the signature legislation, and the accompanying political scrutiny, the web site's performance amounted to a perfect storm of the many ways in which key systems can fail. If nothing else, Healthcare.gov transformed web site performance into front page news.
2. Major Outage for BlueHost, HostGator, HostMonster - The year's most extensive web hosting downtime occurred Aug. 2, when a Utah data center supporting some of the industry's best known brands suffered extended networking outage. The problems at a Provo, Utah facility operated by Endurance International Group led to downtime for customers of BlueHost, HostGator and HostMonster. Endurance attributed the incident to a hardware failure during routine server maintenance that "quickly cascaded throughout the network.”
3. Visa Downtime Across Canada - Downtime is particularly costly in the financial sector, as many Canadians learned Jan. 28 when they were unable to use their Visa cards for much of the day due to a data center power outage at Total System Services Inc. (TSS), one of the largest processors of card-payment transactions in North America. The issue affected Visa cards issued by CIBC, Royal Bank of Canada and TD Canada Trust.
4. Windows Azure, Xbox Live Problems as Xbox One Launches - Xbox One launch day in November turned out to be a rough ride for the Windows Azure cloud computing service, which helps power Xbox Live. The platform was plagued by problems for much of the day, including storage and network issues. It wasn't the only high-visibility hiccup for Microsoft's cloud operations. In March a heat spike in a data center caused a major outage for Microsoft's web-based email services. Both Hotmail and Outlook.com were offline for up to 16 hours after a failed software update caused the heat to spike in one part of a data center supporting those services
5. Power Outage Knocks DreamHost Customers Offline - Web hosting provider DreamHost experienced an extended outage on March 20 when power systems failed at its data center in Irvine, Calif. The incident created hours of downtime across two days for DreamHost's more than 350,000 customers.
We continue with The Year in Downtime: Top 10 Outages of 2013.
6. Ripple Effect Felt from Amazon Cloud Outages - Microsoft wasn't the only cloud provider to experience uptime issues. Amazon Web Services had several attention-grabbing outages this year, most notably downtime in August that affected both AWS and the Amazon.com home page. In September, networking issues caused a Friday the 13th outage for AWS that affected services for Heroku and Github, among other sites. Despite those incidents, Amazon's performance in 2013 was a significant improvement from 2012, when the platform closed the year with a major outage on Christmas Eve that even affected Netflix.
7. Data Center Fire Knocks Michigan County Offline - IT services in Macomb County, Michigan were knocked offline after an April 17 fire damaged the building that houses the county's data center. Macomb County, which is just west of Detroit and has 850,000 resident, did not have a backup data center. Officials are resorted to pen and paper, carbon copies, and makeshift networks of laptops to try and maintain services. The building remains closed, with a new IT operations center expected to come online this month. Assistance from nearby counties, state government and Macomb Community College helped the county resume operations.
8. Toronto Flooding KOs Data Center Cooling Systems - A massive rainstorm caused widespread flooding and power outages in Toronto, which created challenges for some tenants at the city's largest data center hub. When the utility power from Toronto Hydro went offline, the carrier hotel at 151 Front Street was able to successfully switch over to generator power. However, the building’s district cooling system experienced problems, causing the heat to rise in some data centers at the building. 151 Front Street is among the Toronto buildings served by a district cooling using a Deep Water Lake Cooling (DWLC) system from Enwave, which taps cool water from the depths of Lake Ontario.
9. NJ State Government Slowed by Data Center Outage - If Chris Christie runs for president, it probably won't be as "Governor Uptime." A power outage in a New Jersey state data center in September knocked out services for a number of state agencies, including the Motor Vehicle Commission and many state web sites. New Jersey had one of the worst outage track records among state and local governments in 2013, as it also suffered lengthy outages in January and July.
10. Yahoo's Tough December - Users of Yahoo Mail experienced up to four days on availability problems last week, prompting a personal apology from CEO Marissa Mayer. "Unfortunately, the outage was much more complex than it seemed at first, which is why it’s taking us several days to resolve the compounding issues," said Mayer, who blamed a "particularly rare" hardware outage in the company's storage systems for the downtime.