The holy grail of rack-scale component disaggregation, as Ampere told Data Center Knowledge just last Monday, would be a fabric that breaks the connection between DRAM and locked-tight system busses. CPUs could be collected into “buckets,” DPUs or IPUs converged into cabinets, and storage arrays could be fronted by super-fast networks, if only system memory could be pooled into aggregated arrays.
Wednesday, that goal became somewhat more visible, as two of the principal innovators in the Compute Express Link (CXL) space, Toronto-based interconnect transceiver designer AnalogX, and Aix-en-Provence, France-based interconnection designer PLDA (until recently, collaborators in each other’s projects) both agreed to be acquired by legendary memory producer Rambus. The deal would be private, and is projected to close in Q3 2021.
CXL — a technology standard first championed by Intel — is a developing method for building a shared memory system. It utilizes an expansion bus that components can treat as PCI Express, but in such a way that multiple components in the system can not only read and write values to the same pool of memory, but the interconnection scheme will sort out the order of those writes just as though it were a database (only much faster). The key attribute here is cache coherency.
If you’re familiar with PC-style architecture (which is still very applicable to data center server design) you know that CPUs have direct lines to system memory (the prevailing architecture being DDR), while GPUs utilize their own dedicated memory (usually GDDR). The accelerator industry is heating up, as Google makes its case for a tensor processing unit (TPU) that would be dedicated to machine learning tasks. Just this week, Intel embraced the notion of an independent infrastructure processor (IPU), that would break with decades of precedent on system bus-oriented design.
So the need for yet another dedicated memory array is something system designers would prefer to avoid.
Intel may already be on top of this problem right now, as it prepares to make further announcements in the disaggregated systems space. Rambus’ move today could make it catch up fast, giving it direct access to the technology it needs to build an effective CXL 2.0 controller.
In fact, PLDA and AnalogX had just announced their successful design of such a controller. On June 2, the two companies declared their design capable of reducing memory access latency to as low as 12 nanoseconds (ns).
That’s closer to DRAM-like performance, but not quite there. Today’s high-speed DDR3 memory yields latencies of about 7 ns, on an ordinary system bus.
A recent Rambus white paper [PDF] explains the benefits of pooling this way:
With low-latency direct connections, attached memory devices can employ DDR DRAM to provide expansion of host main memory. This can be done on a very flexible basis, since a host is able to access all or portions of the capacity of as many devices as needed to tackle its workload. Analogous to ridesharing, memory is available to hosts on an “as needed” basis, delivering greater utilization and efficiency of memory. And this architecture would provide the option to provision server main memory for nominal workloads, rather than worst case, with the ability to access the pool when needed for high-capacity workloads, offering further benefits to TCO [total cost of ownership].
Should this trend continue as data center computing equipment makers foresee, the formulas that capacity planners use for provisioning space and cooling resources, based on the compute capacity of individual servers, may soon need to be completely replaced.
Wednesday’s news comes just in time for Rambus’ annual Design Summit, due to be held virtually this year, on June 23 and 24.