No industry has been pummeled by the global chip shortage like the automotive industry, but few haven’t felt any pain. So, how has the pain manifested in the data center industry, which exists solely to ensure computer processors can go on crunching numbers, uninterrupted?
The supply of server CPUs looks steady – mostly because they produce more profit for chip manufacturers than their other products, so they’ve been prioritized. Things look bleaker in other industry subsectors. Network-switch vendors are dealing with extraordinarily long silicon lead times, leaving their executives trying hard to convince stock analysts of their ability to source enough to meet their revenue forecasts for the year. Companies across the board have been spending a lot more time and money than normal on supply chain management. One large power and cooling infrastructure equipment vendor said it will likely pass this extra cost to its customers.
As the pandemic drove most of the developed world to work, study, and play remotely, from their homes, demand for personal computers and servers underneath all the digital services skyrocketed. Meanwhile, chip manufacturers, like everyone else, had to learn how to operate with COVID-19 running wild. Chinese companies, fearing that trade tensions with the US might lead to sanctions, have reportedly been overbuying and stockpiling chips. Others, as typical in a period of shortage, have been trying to build larger stockpiles too, artificially making the bottlenecks even worse.
Protecting their businesses, chip foundries focused on pleasing customers that bring the most profit.
“Silicon for the data center provides good margin for the fabs compared to more commoditized silicon, which is power electronics or any other segment,” Manoj Sukumaran, senior enterprise IT analyst with Omdia, said. While server and PC sales spiked, some car makers delayed orders in the first half of last year, not knowing what the pandemic would do to their demand, he explained. So, chip fabrication companies like TSMC, Global Foundries, and Samsung Foundries prioritized high-margin orders from data center and PC customers and “basically sidelined” everyone else.
‘It’s Little Things’
That’s not to say that server chip vendors have been unaffected. While there haven’t been major issues with server CPU silicon wafers, other components necessary to put together a CPU are in short supply, and even giants like Intel and AMD have had to make operational changes and spend money on mitigating the situation.
Servers themselves require more than CPUs, of course, and supply of everything from BMC chips down to resistors, capacitors, and circuit boards is tight, Sukumaran said.
Charlie Boyle, who runs Nvidia’s DGX unit, which sells full AI server systems, said that while he hasn’t been short on CPUs or GPUs, it’s taken a lot of extra work by the company’s operations team to source other components. “It’s little things. It’s like, resistor; it’s a little transistor somewhere; it’s a power module,” he told us in an interview for The Data Center Podcast.
The shortage hasn’t impacted Nvidia’s ability to deliver DGX systems to customers who ordered them, but “that doesn’t mean that we haven’t had to do a lot more work,” Boyle said.
52-Week Lead Times
Arista Networks is one of the largest data center networking switch vendors and a major supplier of switches to cloud providers, including hyperscalers.
“The supply chain has never been so constrained in Arista history,” the company’s CEO, Jayshree Ullal, said on an earnings call earlier this month. “To put this in perspective, we now have to plan for many components with 52-week lead time. COVID has resulted in substrate and wafer shortages and reduced assembly capacity. Our contract manufacturers have experienced significant volatility due to country specific COVID orders. Naturally, we're working more closely with our strategic suppliers to improve planning and delivery.”
A supply chain source from another data center networking equipment maker confirmed 50-week-plus switch-silicon lead times.
These have translated to protracted lead times for final products being delivered to Arista customers. “Clearly, customers would like to have stuff sooner, and we would like to give it to them sooner,” Ullal said. “But we are facing extended lead time.”
Arista expects this to be a “pain point” for the rest of the year, according to the CEO.
Hock Tan, CEO of Broadcom, a key supplier of switch silicon (including to Arista) that itself relies on third-party chip manufacturers, acknowledged on an earnings call this month that the company had “started extending lead times.” Part of the problem, he said, was that customers were now ordering more chips and demanding them faster than usual, hoping to buffer against the supply chain issues.
Executives from networking equipment vendors Cisco and Juniper also cited major semiconductor supply constraints on their most recent earnings calls.
Power Equipment Is About to Get More Expensive
Supply chain conditions have led Vertiv, one of the biggest sellers of data center power and cooling equipment, to delay previously planned “footprint optimization programs.”
Because of the combination of supply issues and high demand from colocation and cloud providers, who continue to build more and more data centers all over the world, Vertiv has “decided to delay some of those programs,” the company’s CEO, Robert Johnson, said on an earnings call.
Vertiv has been dealing with some parts and materials shortages, Johnson said, but added that the company has been able to find solutions for most of them.
The combination of supply chain constraints and inflation would cause “some incremental unexpected costs over the short term,” he said. Part of the solution will be “to share the cost with our customers where possible.”
He doesn’t expect at this point that Vertiv will be unable to deliver on orders.
Jean-Pascal Tricoire, CEO of Schneider Electric, Vertiv’s largest competitor in the data center infrastructure equipment space, sounded optimistic on the company’s earnings call in February about its ability to manage through the supply chain issues, but added that 2020 was “an everyday test of the resilience of the supply chains.”
Underinvestment in Substrates Rears Its Ugly Head
One big contributor to the overall chip crisis has been shortage of substrates, or packages that hold individual chip components.
Substrate manufacturers have underestimated demand and underinvested in production capacity in recent years, a problem that is now acutely felt across the industry as it faces a demand crunch. Both Intel and AMD have been spending in this area to help improve the situation.
“On the substrate side, in particular, I think there has been under-investment in the industry,” AMD CEO Lisa Su said on an earnings call in April. “And so, we've taken the opportunity to invest in some substrate capacity dedicated to AMD, and that will be something that we continue to do going forward.”
There has been “a major constraint in our substrate supply,” Intel CEO Pat Gelsinger said on his first-quarter earning call. The company is working to ease the constraint by partnering with suppliers to “creatively utilize our internal assembly factory network.” He said he expected to see fruits of this labor during the second quarter, when “this capability will increase the availability [by] millions of units in 2021.”
Gelsinger also noted the company’s plans to start manufacturing chips as a third-party supplier – the first big change he made since stepping into the role earlier this year, for which there couldn’t be a better time.
Supply constraints have been primarily impacting Intel’s client, IoT, and FPGA businesses – not its data center group, George Davis, the company’s CFO, said on the call.
2022? 2023? No One Knows
While none of the executives mentioned here said that they expected the chip shortage to cut into their ability to meet their 2021 revenue expectations, there isn’t a clear picture of how the situation will continue playing out and, most importantly, when supply chains might get back to normal.
“The capacity at foundries is growing at about 1 to 3 percent per year, but demand for compute is growing faster,” Vlad Galabov, director of cloud and data center research at Omdia, said. “All types of computing devices are in high demand, and COVID has actually accelerated people’s refresh cycles and purchasing cycles. But capacity at the foundries doesn’t accelerate overnight.”
Each vendor is tied to a particular foundry for a particular product, so it’s not easy to switch from one manufacturer that’s at capacity to another one that may have some free bandwidth, Omdia’s Sukumaran explained. If you contract TSMC to produce a particular chip, TSMC sets up a production line specifically for that chip.
These production lines take a long time to set up, and they don’t fire on all cylinders from day one. “It’s building the infrastructure and actually optimizing it to create the yields that you are looking for,” Sukumaran said. It can take close to a year before CPU yields are at the right level, but much less time for other, passive components.
Because it takes so long to expand manufacturing capacity and because demand for compute isn’t slowing down, it’s anyone’s guess when supply chain pressure might ease. Gelsinger’s estimate is “a couple of years.” Other estimates have ranged from mid-2022 to 2023.
Omdia’s Galabov thinks even 2023 may be too optimistic. “I personally am skeptical on capacity release in 2023. This could be the new normal.”