There are multiple areas of potential risk in data center environments that can cause incidents resulting in an insurance claim. Risks include:
- Accidents that damage the facility
- Potential for workplace injuries
- Business risks from downtime events that impact the data center’s or its customers’ business continuity.
Organizations depend on 24 x 7 x 365 IT infrastructure availability to ensure that services to customers/end-users are available whenever needed.
To provide and maintain this availability is not only a matter of designing and building the right facility infrastructure; it’s about how that facility is managed and operated on a day-to-day basis to safeguard the business-critical infrastructure.
Importance of Insurance and Risk Management
Relying solely on the physical characteristics of the data center like construction, type of fire protection system, and proximity to flood and earthquake-prone areas, although important, leaves out very important considerations in evaluating the effectiveness of a service provider’s risk management program. Typically, the redundant infrastructure of engineered data centers does present a low frequency of loss when compared to other types of operations. However, there is a significant increase in reliance on these data centers by end users as more companies outsource to the cloud or house their primary or backup networks offsite. This increasing dependency of end users on a centralized, outsourced infrastructure presents opportunities for technology service providers to set themselves apart from the competition and manage risks by formally addressing operational controls.
In framing the risks that service providers are exposed to—and that insurers will be concerned with—it is important to view the operation in terms of what part of the “data supply chain” the service provider occupies or is responsible for. Infrastructure providers, such as a colocation provider, have a specific but related set of exposures as compared to a software as a service (SaaS) provider at the other end of supply chain. The various entities in these increasingly complex supply chains must make decisions about the viability of accepting, avoiding, mitigating or transferring these risks. The risks to the data supply chain include not only first-party direct losses, but third-party liability losses as well. Even the first party losses will differ based on the services provided. The primary risks to the data supply chain can be categorized as:
Third Party (Liability)
- Service Interruption: Service providers may be responsible for customer losses incurred as the result of unplanned outages.
- Data Security / Privacy: Service providers may have statutory, contractual, or implied duty to protect data from unauthorized access or disclosure. Service providers may also be responsible for appropriate backup and recovery of data to prevent customer loss.
- Damage to Property of others in care, custody, or control: Infrastructure service providers’ facilities typically house multiple customers’ assets with values in the millions of dollars for each customer. Contract terms, customer insurance requirements, and state laws may impact the degree to which the provider is responsible for damage to customer equipment.
- Premises Liability: Responsibility to ensure that owned properties remain free of unsafe conditions.
First Party (direct losses to insured)
- Property Damage: Loss or damage to owned property, which in the case of an infrastructure provider may include real property and business personal property. Service providers who are dependent on others to provide infrastructure services may also have significant property values related to IT assets at widely distributed locations.
- Business Interruption: Service interruptions create potential for direct loss of income. Interruptions may be caused by perils impacting an infrastructure or service provider directly or as contingent loss caused by perils impacting a service provider on which the operation depends.
- Extra Expense: Additional expenses incurred to resume operations after a loss event can include additional staff, overtime costs, leased equipment, etc.
- Equipment Breakdown: Service interruptions may also be caused by the breakdown or failure of machinery or equipment as opposed to the standard property insurance perils (fire, theft, weather related, etc.).
Employee Health and Safety: Providing a safe work place is the responsibility of all types of employers, and a key cost management strategy related to workers’ compensation and health benefits costs.
Regulations: Regulations create compliance risks at all levels of the data supply chain. Regulatory impact is greatly dependent on the types of services offered, industries served, and the complex shared responsibilities of infrastructure and service providers and their clients. A few examples of regulatory frameworks that may have impact down to the infrastructure level include U.S. regulations such as HIPAA, GLBA, FISMA; international regulations such as the EU Data Protection Directive and industry standards such as the PCI DSS. In these complex regulatory environments, regulatory enforcement actions are common and the impact of fines and penalties is growing.
Management and Operations Critical
Even the best facility infrastructure will not keep a site from having an outage or accident if the individuals running it fail to define effective policies and procedures, maintain staff training, and apply those procedures in practice. Additionally, existing centers may have vulnerabilities due to aging facilities or equipment, yet can still minimize downtime risk and limit exposure if the operations team is working effectively.
From 20 years of collecting incident data, Uptime Institute has determined that human error (i.e., bad operations) is responsible for approximately 70% of all data center incidents. Compared to this, the threat of “fire” as a root cause is dwarfed: data shows only 0.14% of data center losses are due to fire. This means bad operations practices are 500 times more likely to negatively impact a data center than fire. In fact, an outage at a mission critical facility can result in hundreds of thousands of dollars or more in losses for everything from equipment damage and worker injuries to lost business and penalties for failure to maintain contractual Service Level Agreements.
For both data center operators and insurers, there are some key questions to ask:
- Are we looking in the right place to assess and mitigate data center risk?
- Are we adequately protected from claims and losses due to data center downtime?
- How can we improve data center risk management?
- How do we know efforts are focused on those operating factors that have the most impact on risk and availability?
Managing Liability Risks
Managing liability risks starts with contracts. A clear scope of work and allocation of risk between the contracting parties is essential. Clauses such as service level agreements, limitation of liability, force majeure, wavier of subrogation and indemnification wording reinforce the intended allocation of risk. Complex multiparty contract disputes are common particularly when significant losses are incurred. Claims of negligence are non-contractual, so even well executed contracts may not mitigate significant liability losses.
Data center operations credentials are another means of mitigating liability risks. In addition to reducing the probability of loss, clearly defined repeatable procedures and processes demonstrate adhering to a duty of care that is foundational to most standards of care. As with any human endeavor, residual risk will remain regardless of mitigation efforts. Insurance provides a means of risk transfer particularly effective on high severity risks.
About the Authors
Lee Kirby is President of Uptime Institute, an advisory organization focused on the performance and efficiency of business-critical infrastructure and administration of the global Tier Standards & Certification for data centers. He has more than 30 years of information technology and leadership experience in the military and private sector.
Stephen Douglas is Risk Control Director for CNA – Technology, focused in the Technology Industry Segment. CNA provides insurance and risk control solutions to businesses in software and IT services, electronics manufacturing and communications industry. He has over 20 years in experience in risk engineering and information technology.