Critical Thinking is a weekly column on innovation in data center infrastructure design and management. More about the column and the author here.
AMD took a hit earlier this week when an analyst note from Morgan Stanley raised some tough questions about its near term growth prospects.
The chipmaker’s stock price dropped by nine percent on 30 October after the analyst forecasted slowing demand for AMD’s GPUs in crypto-currency mining and game consoles in 2018.
The stock stumble comes at a time when the chipmaker is trying to reestablish itself across the board, including in the data center. In June this year, AMD launched its EPYC line of server processors that are a direct challenge to Intel’s highly profitable Xeon range. It has already seen some initial traction with support from Chinese data center operator customers, such as Baidu and Tencent, as well as OEM partners, such as Lenovo.
AMD has its work cut out however. According to estimates from IDC, the chipmaker’s share of data center CPU shipments was less than 1 percent in 2016. It is also facing strong competition from NVIDIA in the market for GPUs.
In an interview with Data Center Knowledge this week, AMD’s senior VP and general manager enterprise, embedded, and semi-custom business group Forrest Norrod laid out the chipmaker’s plan to get back into data center contention.
The discussion was obviously skewed towards silicon-specific roadmaps and innovation but not exclusively. Norrod’s previous role was VP and general manager of Dell’s server business, so he was able to put some of the discussion in context of potential impact on overall data center facilities design and operation. He gave his views on topics including increasing rack power densities, cooling technologies, and data center management software.
AMD first contemplated getting back into the high-performance space about six years ago, says Norrod. AMD chief executive Lisa Su and CTO Mark Papermaster decided to essentially reset the company’s high-performance processor roadmap. “They made the hard call to discontinue the former CPU core development and stop doing the server chips, using those cores to free up enough dollars to do a reset. That really led to an entirely new core, the Zen core, which allowed us to get back into the game,” he says.
Fundamentally, AMD now wants to solve problems in the data center in a different way from its main CPU rival Intel. Its strategy is to understand specific customer requirement trends and then analyze how they might intersect with practices that Intel appears to be avoiding.
For example, according to a recent whitepaper from Tirias Research (sponsored by AMD) a proportion of data center customers buy more expensive energy-hungry two-socket systems to get access to I/O capacity or memory rather than the second-socket’s actual additional compute resources. AMD hopes to essentially undercut Intel by building more capabilities into lower-cost single-socket systems.
“Intel has, or has had, a fantastic near monopoly position in the two-socket server business, and they would much rather sell two pieces of silicon than one. I don’t want to be too flippant about it, but it is almost that simple. They don’t want to see the continued migration down the curve,” says Norrod. “We wanted to offer a two-socket configuration -- as that’s the market right now -- but we also wanted to offer something disruptive, which is a one-socket configuration that was truly optimized. We wanted to eliminate the artificial constraints of memory or bandwidth or number of cores, all of which I think Intel has applied to their one-socket roadmap over the years.”
Although less of a strategic priority, another area where AMD hopes to compete is in data center management software. Intel currently licenses its Intel Data Center Manager (DCM) software to a number of data center infrastructure management (DCIM) suppliers as well as directly to some large operators. The software provides more granular power and thermal monitoring as well as capabilities such as power capping.
Uptake of Intel DCM to date by end users has been hard to gauge, and some DCIM suppliers have chosen not to engage with Intel because of its licensing terms. Norrod says AMD has plans to work with DCIM suppliers but will not be productizing its power monitoring and management capabilities in the same way. “We do have some pretty strong power management capabilities at the node level. However, we are not going to productize a separate product and be proprietary. Our focus has been to expose those capabilities up through the operating system and out to DCIM partners,” he says. “You will see some announcements in the not-too-distant future.”
Looking more widely at how rack power densities might increase over time, Norrod believes the combination of EPYC and GPU accelerators could enable extremely dense systems. “The most interesting configurations from a power density perspective I have seen is that we have some customers that are building out EPYC 1P systems that are utilizing all of our PCI capabilities to put four or six accelerator cards in each EPYC system. So you think abut a 1U slice with a 1P EPYC, up to 2TB of RAM if you need it, and four accelerator cards; you can cram up to 90 GPUs into a rack along 40 EPYC procs and easily get over 50kVA.”
That increase in density could eventually lead to selective demand for high-density cooling technology such as direct liquid cooling. “At a certain point, as rack densities increase, there is no way around it. You don’t have enough space to get the sufficient mass of air through to cool the equipment,” says Norrod. “At a certain point, and I don’t know where that point is -- certainly below 100kW but probably above 50kW -- there is simply no alternative. I think there will be folks experimenting in their machine learning farms where the compute density on these GPUs is just outrageous. I think it will happen there, but there will be a balance.”
Ultimately, Norrod believes that if AMD can provide stronger competition in the data center, it will be good for industry but ironically also good for Intel. “I think they will respond very strongly to our competitive entry, and I think that will be very good for the industry, and frankly I think very good for Intel,” he says. “I think they have focused previously on how they can maximize their share of wallet rather than how to build and develop the server ecosystem.”