Data Center Fire Disrupts Key Services in Calgary
July 16th, 2012 By: Rich Miller
The city of Calgary is recovering from the impact of a data center fire that crippled city services and delayed hundreds of surgeries at local hospitals. The explosion and fire last Wednesday in a Shaw Communications facility knocked out both the primary and backup systems that supported key public services for local government and medial institutions.
The incident serves as a wake-up call for government agencies to ensure that the data centers that manage emergency services have recovery and failover systems that can survive as series of adversities – the “perfect storm of impossible events” that combine to defeat disaster management plans.
The outage took out the city’s 311 emergency services, and Alberta’s property and vehicle information databases, which were maintained in an IBM data center at the Shaw building. The outage also knocked out a medical computer network for Alberta Health Services, forcing the postponement of hundreds of elective surgical procedures. Service was restored late Friday, but the service said it would take some time to work through the backlog of procedures.
Fire Suppression Takes Out Backup Systems
The problems began when a transformer exploded at Shaw’s downtown Calgary headquarters on Wednesday afternoon. The fire set off the building’s sprinkler system, which took out the backup systems, which were housed on site. On Saturday morning, Shaw officials said service had been restored to all customers.
IBM Canada, which provides many services for the province of Alberta, reportedly had to fly backup tapes holding vehicle and property registration data to a backup facility in Markham, Ontario.
Calgary family doctor Dr. John Fernandes was among those whose practice was affected by the 36-hour outage. “It’s daunting because, in my practice, I have a lot of frail, sick, elderly folks. I have organ transplant recipients and people on dialysis,” Fernandes told the Calgary Herald. The surgeon said his office maintains a server that mirrors the off-site server, and wonders why the province’s health superboard didn’t take similar precautions for a system failure of this magnitude.
But it’s not the only recent example of an incident taking out key government resources.
IT systems in Dallas County were offline for more than three days last week after a water main break flooded the basement of the Dallas County Records Building, which houses the UPS systems and other electrical equipment supporting the data center on the fifth floor of the building. The county did not have a backup data center, despite warnings that it faced the risk of service disruption without one.
Hmmm, it seems that outages like this and AWS recently are revealing that some of the country’s largest organizations are not taking redundancy to heart. We believe that one datacenter is no longer good enough. Two datacenters and our cloud load balancing solution would have taken care of this: http://totaluptimetech.com/solutions/cloud-load-balancing/
This is a classic example of why the fire suppression industry recommends waterless fire protection in a data center. Redundancy plays a part, and Shaw had backup systems, but unfortunately they put both systems in rooms that did not have a properly engineered waterless clean agent fire suppression. These systems are specifically designed to suppress this type of fire with taking out electrical systems; in this case the backup systems. FM200 or NOVEC 1230 are the clean agents predominately used for data center fire protection.
Local002Posted July 26th, 2012
What’s more troubling is Shaw’s attitude in this entire ordeal. Instead of taking the blame for a poorly engineered facility and lack of proper DR they have stated this was a “Force Majeure” situation that they had zero control or responsibility over and too-bad-so-sad to those clients (IBM) who were paying for a level of continuity they were not provided.
For those who are not aware Shaw is the local cable-co monopoly and also provides consumer grade Internet connectivity.
Also one would assume IBM would have known better.