A change is coming to the internal architecture of servers, one that is set to significantly reduce data center costs, increase application performance, and even introduce new rack-level data center architectures. “The effect on data centers will be profound,” says Manoj Sukumaran, principal analyst at Omdia. The agent of change is a protocol that has been created by a very broad and heavy-hitting industry consortium that includes Cisco, Intel, Dell, Oracle, Lenovo, HPE, IBM, Microsoft, Samsung, NVIDIA, and Google. The protocol is called Compute Express Link (CXL).
The Promise of CXL
CXL promises big changes by radically improving the way that servers use memory. This is set to benefit a wide range of applications, not only because memory accounts for around 50% of the cost of a typical data center server, but also because the ultimate limit on server performance is very often the amount of memory that its CPUs can access. The protocol and its potential impact on data centers are on the very near horizon.
“CXL evaluations are going on now,” said Sukumaran. “New generation of CPUs from Intel and especially AMD that support CXL are being deployed and will ramp up in the second half of this year. This means that CXL deployments will begin towards the end of the year and scale up during 2024.”
Marc Staimer, IT industry analyst and president at Dragon Slayer Consulting, said: “It’s very difficult to make a general prediction about how much money CXL will save in data centers. But if CXL performs as promised, the savings will certainly be more than enough to guarantee its widespread adoption.” Staimer is confident that CXL will live up to those performance promises “because there’s so much momentum behind it.”
Breaking Down the Memory Wall
The architectural problems that CXL is designed to address begin with what is often called the memory wall. This is the difficulty in maximizing application performance by allowing processors to access data at the same rate that they can process it, so that they are not left idling while they wait for work. The biggest causes of the difficulty are not the speed of the interface between processors and memory but instead are the high cost of memory, the difficulty in using it efficiently or at high utilization rates, and the limit on the amount of memory that an individual CPU can rapidly access, especially in the majority of servers that tie memory tightly to CPUs.
Within low-cost industry-standard servers, if a CPU needs to access data that its memory is not large enough to hold as cache, the data must be retrieved from far slower flash or disk storage, heavily reducing performance. This problem is becoming more acute and affecting more applications. On one side of the wall, the processing power of CPUs is growing with their increasing core-counts, driving up their appetites for data. On the other side, data-intensive applications such as AI and analytics require increasing volumes of data to be held in memory to achieve adequate performance.
CXL vs. NUMA
One solution has been the development of so-called non-uniform memory architecture (NUMA) or big-iron servers, which unlike other server allow CPUs to access each other’s memory at high speeds. But NUMA servers are expensive because of the extra silicon needed to do this. They also don’t allow memory to be expanded without adding more CPUs, which forces even greater costs by requiring extra CPUs to be installed when more memory is needed, even if extra processing power is not required.
CXL provides an alternative by running on top of the Peripheral Component Interconnect Express (PCie) links that are already present in servers. It may seem surprising that the PCIe links used to connect flash drives to CPUs can also provide the far faster data transfer speeds needed for memory, but CXL achieves this as a protocol that moves data as memory load-store operations rather than as IO transfers.
The result is that when CXL runs over the latest PCIe 5 standard, it moves data very quickly, at memory-style speeds. Because it is running over PCIe, it does not just introduce a new way of connecting processors to memory and increasing the amount of memory they can address. It is also set to work in the opposite direction and allow data centers to deploy less memory.
According to Jim Handy, IT industry analyst and general director at Objective Analysis, “The memory makers say CXL will allow servers to use more memory, which is good for their businesses because it means they’ll sell more memory. Meanwhile data center operators want to use it to reduce the amount of memory they need to buy. This is the conundrum of CXL.”
Reducing Stranded Memory Waste
Servers use different amounts of memory at different times, and standard data center servers cannot share unused memory with other servers. However, they must be fitted with enough memory to meet their needs at any time. As a result, data center operators have no choice but to overprovision servers with memory capacity that may often be unused. This creates so-called stranded memory.
Sukumaran cites estimates of the way this affects hyperscalers; Microsoft has estimated that 25% of memory in Azure is stranded at any given time, while Google states that only 40% of memory is used in its production clusters.
Hyperscalers suffer especially heavily from stranded memory because of the variety and unpredictability of the workloads they host. However, they are not the only ones facing this problem. Other data center operators also suffer from stranded memory because it is not just caused by mixed and unpredictable workloads. It is also caused by the dynamics of server virtualization and the fact that applications need different amounts of memory at different times.
CXL is set to significantly reduce this wastage by allowing multiple servers to share pools of memory and use only what they need, and only when they need it. This allows the overprovisioning of memory to be sharply reduced. Spare capacity to meet peak demands is still required but can be smaller because it is shared across multiple servers that will not make their maximum demands all at the same time.
CPU support for CXL is currently at version 2.0 of the protocol, which supports PCIe networks that allow multiple servers to share access to the same memory. CXL version 3.0 was released last year and extends this support. This is how CXL is expected to allow multiple servers to share pooled memory, and how it is set to change rack-level architectures, by creating data center racks that include chassis that will provide CXL-connected memory for use by the servers in the same rack.
About the Author
Tim Stammers is an independent industry analyst who has been writing about IT for almost three decades. He has previously been a senior analyst at 451 Research, and at Omdia.