Updated to clarify that lightning had struck the local utility grid and not the actual Google data center.
Not even the mighty Google data centers are immune to acts of god it turns out.
A series of successive lightning strikes in Belgium last Thursday managed to knock some cloud storage systems offline briefly, causing errors for some users of Google’s cloud infrastructure services.
In the initial version of the incident report published Tuesday, Google said lightning had struck electrical systems of one of its three data centers in St. Ghislain, a small town about 50 miles southwest of Brussels. But a company spokesperson alerted us late Wednesday afternoon that lightning had actually struck the local utility grid, which caused the data center power interruption. Google's incident report has been updated accordingly.
The affected data center hosts the europe-west1-b zone of Google Compute Engine that experienced issues.
Besides failover systems that switch to auxiliary power when primary power source goes offline, servers in Google data centers have on-board batteries for extra backup, which was the case with the servers supporting Persistent Disk, the cloud storage that acts like Network Attached Storage or storage that’s independent of compute.
But some of the servers failed anyway because of “extended or repeated battery drain,” according to the incident report. “In almost all cases the data was successfully committed to stable storage, although manual intervention was required in order to restore the systems to their normal serving state,” the report read.
Google engineers estimated that about five percent of persistent disks in the zone saw at least one I/O read or write failure over the course of the roughly five days the problems appeared. A tiny fraction of the persistent-disk space in the zone lost some data permanently: 0.000001 percent, according to Google.
The company’s infrastructure teams are currently in the process of replacing storage systems with hardware that’s more resilient against power failure, and most Persistent Disk storage is already running on the new hardware, Google said.
In a piece of advice cloud service providers commonly offer following cloud outages, Google reminded users that it has multiple cloud regions around the world and multiple isolated zones within each region precisely so that users can set up resilient infrastructure that can fail over from one zone to another in case of a single-zone outage.
Google Compute Engine has three regions: Central US in Council Bluffs, Iowa, Western Europe in St. Ghislain, and East Asia in Changhua County, Taiwan. There are four zones in the Central US region and three each in Western Europe and East Asia.