The largest companies that operate data centers in Texas appear to have weathered last month’s extreme and prolonged cold snap – and the ensuing statewide electricity chaos – without major outages. Some data centers in the state did fail, according to local news reports and public notices by organizations whose services were interrupted because of the outages.
Data center facilities and operating processes are designed to prevent outages when electrical utilities fail, and overall it looks like most of them worked as designed during the third week of February, after temperatures plunged, knocking out a huge portion of the state grid’s generation capacity and dragging down its operating frequency. Bill Magness, CEO of the Electric Reliability Council of Texas (the grid operator) who was fired last week, said that at one point the frequency was so low that it looked like the grid was minutes away from complete failure, saved only by the load reduction energy providers achieved through ERCOT-ordered blackouts.
Not everything went smoothly for Texas data center operators. It was colder than it had ever been in recent memory, and the amount of generation capacity that went offline – about half – was unprecedented.
Local sources shared with DCK that some data center operators had issues with electrical failover systems and cooling systems not behaving as expected because (not unlike the state’s electrical infrastructure) they weren’t adequately “winterized,” or designed to operate at such low temperatures. Many operators had a hard time securing timely deliveries of generator fuel to top off their tanks in case the blackouts would extend longer than their on-site fuel supplies could last.
Primary Concern: Generator Fuel
Not all data centers in the state lost utility power, but if you operated one that did and you wanted to get some diesel to your site, you had little chance of getting it delivered from within the state’s borders. Securing a timely enough fuel delivery required deep pockets, tight relationships (preferably with national-scale fuel suppliers), and some creativity.
There was no shortage of diesel in the state – which has far more oil refineries per square foot than any other – but getting it out of storage and to where it was needed was a different matter. Even Texas truck stops, where delivery trucks fill up tanks for their own engines, were mostly out of commission, either because they had run out of fuel and couldn’t get more delivered or because they had lost power and couldn’t run their pumps.
A person who works for a data center operator in the state who spoke with DCK on condition of anonymity said the operator, whose entire data center capacity in Texas ran fully on generators for two days during that week, said there was no fuel available from typical sources and suppliers that could source it wouldn’t guarantee a timely delivery. The operator eventually managed to source fuel from an out-of-state supplier.
If a data center operator contacted their normal in-state fuel supplier during that week, “they either were told it was going to be a longer time and they could wait, or they were told, ‘we just don’t have any way to get to you,’” Scott Fisher, senior VP of policy and public affairs for the Texas Food & Fuel Association, told DCK. In many parts of the state, roads were so icy that not a single truck was moving for days, he said.
“Believe it or not, all of Texas was impacted by that winter storm,” Fisher said. “Every county was impacted, either with snow-ice or well-below-normal winter temperatures, all the way into the Rio Grande Valley, which is subtropical. They rarely get down to [negative] 20 degrees (Fahrenheit) – let alone 40 degrees. Everything you can imagine that can freeze in that kind of situation pretty much did.”
‘It Would Affect the Internet, Period’
Akamai Technologies, which operates one of the world’s largest content delivery networks, keeps most of its computing capacity in the region in six data centers in the Dallas-Fort Worth metro, neither of which worried Todd Lawrence, the company’s VP of Americas infrastructure, as much as the one building in the area where Akamai’s local network interconnects with the rest of the internet: Equinix-owned Infomart, at 1950 N. Stemmons Freeway in Dallas.
“It was the number-one concern,” Lawrence told DCK. “For me it wasn’t about servers [in the six other data centers in the area], it was about routers [at Infomart] going down, and that would’ve been a real problem.”
It was worrisome because Infomart had switched to generators, but Akamai’s local team wasn’t getting solid information from Equinix about when fuel deliveries were scheduled. (The team had created a spreadsheet to track fuel status and staff access at each of its Texas sites amid all the chaos.)
“During this very dynamic crisis, Equinix provided regular, transparent, and conservative estimates of fuel levels by generator to all customers in data centers that had switched to generator power,” Equinix spokesman David Fonkalsrud wrote in an email to DCK. “These communications included the confirmed information we had at the time.”
Infomart didn’t experience an outage and the building was eventually switched back to utility power.
Asked what the impact of an Infomart outage would be, Lawrence said, “It would affect the internet, period. Lion’s share of the connectivity runs through that building. I would think there would be a pretty major disruption. This was probably the closest we all got to it.”
‘Convoy of Trucks’
Digital Realty Trust, which operates 13 data centers in the Dallas market and one each in Houston and Austin (and which recently moved its headquarters from San Francisco to Austin), saw its years-long, nation-wide contract with the national supplier Foster Fuels really pay off during the Texas power crisis.
Four of its data centers in Dallas and the one in Houston ran on generators for an extended period because utility power was unstable, and Foster literally went hundreds of extra miles to make fuel available to the operator, trucking it in from several neighboring states.
“We are a top-priority, first-response customer with Foster Fuels, right up there with FEMA and the DoD,” David Sukinik, director of data center operations at Digital Realty, told DCK.
The supplier guarantees fuel delivery to Digital sites within 24 hours of request, and it’s never failed to make good on that guarantee, Benny Furtick, a Digital Realty technical operations manager in Texas, told us. The incident in February “was unique, and it was a lot larger than I think we expected,” but Foster did what was necessary to get fuel to where it was needed, he said, recalling that at one point during the week an entire “convoy of trucks” carrying fuel for Foster customers crossed the border from Louisiana into Texas.
Watch the Weather
Operators that saw their cooling systems or emergency power failover infrastructure malfunction didn’t get the “luxury” of extra generator fuel being their main concern. DCK hasn’t been able to identify the operators that experienced these issues, but according to our sources, they experienced them because their systems weren’t designed to operate in temperatures as cold as it got in Texas that week.
“I don’t think anyone thought it was going to get as bad as it did, so far south,” Chris Brown, CTO at Uptime Institute and a Texas native (he now lives in Oklahoma), told DCK.
Some Texas operators, for example, saw their generator fuel start “gelling” in the cold, causing generators to malfunction, he said, explaining that that usually happens when a certain fuel treatment process is omitted by an operator.
“Some of them had engine generators not wanting to start,” Brown said. That can happen for several temperature-related reasons, a common one being generator start batteries failing in extreme cold, he explained.
Organizations that said their applications were disrupted because of Texas data center outages include Greyhound, the bus carrier, which said it couldn’t sell tickets because of a weather-related data center power outage in Texas; healthcare tech vendor Availity, which said its primary network went down because its Dallas data center provider’s attempted transfer from utility to backup power failed; and Medi-Cal, the California state health insurance provider for low-income individuals, which said its website went down because of “extreme weather affecting its primary data center in Dallas…”
Outside of hoping ERCOT and the State of Texas finally winterize the state grid (uniquely untethered from the national grid), the incident’s biggest and most obvious lesson for data center operators is that historic temperature ranges are no longer a reliable guide when designing mission critical systems.
Uptime Institute, in its reliability certification standards, requires data centers to be designed to withstand temperature ranges within ASHRAE’s 20-year extreme minimums and maximums, Brown said. But “even if you design to that, eventually mother nature is going to show you that we’re still insignificant,” he said.
“Now we’re pushing 100-year temperature records. Just because your data center’s been designed to certain extreme ambient conditions isn’t going to be a guarantee that you’re never going to exceed those ambient conditions,” Brown said.
It’s important to understand where your system’s limits are and be prepared for when it’s pushed over those limits. “You’re only going to be as successful as your creativity,” he said. What is your company’s plan for when an entire data center goes offline? Is there another site you can switch to that’s far enough from the affected area?
Another obvious lesson is to heed weather warnings by national and local authorities. “My suggestion to data centers – if they haven’t already figured that out – is they need to be watching the weather a lot closer,” Texas Food & Fuel Association’s Fisher said. “This system that hit us was forecasted 10 days out to be exactly what it was.”