To a large extent, Intel is positioning its 2nd Gen Xeon Scalable chips as ideal for processing the growing volume of data, emphasizing the value of a CPU that can run machine learning inferencing jobs as well as mainstream workloads.
The range of new Xeon SKUs with varying numbers of cores is bewilderingly large. You can get everything from a System-on-a-Ship specialized for embedded networking and network-function virtualization to the doubled-up Cascade Lake-AP, which combines two processors for up to 56 cores per socket, supporting terabytes of memory aimed at high-performance computing, AI, and analytics workloads, and delivered as a complete system with motherboard and chassis.
There are Xeon processors that are specialized for cloud search or VM density – although to Intel, that can mean bigger, beefier virtual machines for workloads like SAP HANA as well as cramming in more VMs for running infrastructure-as-a-service as cheaply as possible.
But the more general-purpose CPUs fit into the same sockets as their “Skylake” predecessors and include options that make them more customizable in use, promising operational efficiencies alongside improvements in performance. While the typical utilization ratio of just 20 percent that we saw in data centers in 2010 has improved, it’s not yet up to the 60 to 70 percent utilization that Intel Principal Engineer Ian Steiner, the lead architect on the new Xeons, said he would like to see.
One way of getting higher utilization is to make the hardware more flexible. The SpeedStep option in the new Xeons lets you mix and match the base core frequency, thermal power design, and maximum temperature for groups of cores, instead of running them all at the same levels.
SpeedStep helps “if you're a service provider who has different customers with different needs, and some of them have a high-performance computing workload that needs high frequency, or [other times] you need to switch that infrastructure over to more IaaS, hosting VMs. In enterprise, you may be doing rendering work or HPC work at night, but during the day you need the broader use,” explained Jennifer Huffstetler, Intel’s VP and general manager of data center product management. “You can ensure you're delivering the SLA high priority-customer workloads and have a little lower frequency on rest of the core.”
Instead of needing different hardware for different workloads, that can be configured in the BIOS remotely through a management framework like Redfish or automatically through orchestration software like Kubernetes, letting you set the frequency that priority applications and workloads run at.
“Or, if you’re building a big pipeline of work where some of the tasks are a bottleneck, you can use a higher frequency on some of the cores [to run those tasks],” Steiner explained. “You can run a single CPU in different modes, so you can have three profiles that you define ahead of time and set at boot time.”
The existing Resource Director Technology can now control memory bandwidth allocation in the new Xeons to find “noisy neighbor” workloads and stop them from using so many resources that other workloads suffer. That improves performance consistency and means that you can run lower-priority workloads instead of leaving infrastructure standing idle without worrying that workloads that need the full performance of the server will suffer.
“In private cloud, we often see underutilized clusters,” said Intel’s Das Kamhout, senior principal engineer for cloud software architecture and engineering. “You usually have a latency-sensitive workload; anything that you’ve got end users or IoT [internet of things] devices interacting with and it needs fast response time. So, people build out their infrastructure to make sure the latency-sensitive workloads always get enough compute cycles to get the work done, but often that means underutilized clusters. Now I can add low-priority or batch work onto the node and make sure it doesn’t impact the latency of my SLA-critical jobs, because my batch job training for a download model can happen overnight for a long period of time.”
Similarly, the Optane DC persistent memory that many of the new Xeons support is designed to be an affordable alternative to DRAM with what Intel calls “near-DDR-like performance” (especially when using DDR as cache) that allows you to increase memory size, consolidate workloads and improve TCO.
One of the most obvious benefits is that the contents of memory are persistent; when a server reboots, the OS restart time will be much the same, but you don’t have to wait while the in-memory database is loaded back into memory. For some HPC workloads, loading the data can take longer than the compute time.
Reading from Optane is also faster than reading from storage. It’s less about the speed relative to SSDs and more about not having to go through the storage stack in the operating system.
But depending on your workload, you can run Optane hardware in different modes and switch between them on the same server. (Intel’s VTune Amplifier software can help you characterize workloads and see if you’re compute-bound or limited by memory capacity.)
Memory mode is for legacy workloads. The application doesn’t need to be rewritten, the contents of memory stay volatile even though they’re stored in Optane hardware, and because Optane is cheaper than RAM, you can put more of it in a server to do things like running more VMs, with the faster DRAM acting as a cache. Instead of 16GB DIMMs, you can put 128GB DIMMs in the same commodity 2U platform and get that near-DRAM performance (70 nanoseconds if the data is in DRAM, 180 nanoseconds if it’s in the Optane hardware).
On Windows Server 2019, Intel suggests that moving from 768GB DDR 4 to 1TB of Optane plus 192GB DDR4 in a 2nd Gen Xeon system will take a third off the cost per VM while supporting up to 30 VMs rather than 22 on a single node, all while keeping the same SLA.
That’s on top of the up to 3.5-times improvement in VM density you could see by upgrading from a 2013 “Ivy Bridge” server, so in theory you can either do more on equivalent hardware or consolidate onto fewer servers to support the same workload. The minimum requirements for an Optane system are still high, so it may be beyond the budget of some consolidation projects.
But Optane also works in App Direct mode, which uses DRAM and persistent memory as separate memory regions. Without the DRAM cache, memory performance is slightly lower (10 to 20 percent, depending on the workload) and applications have to be rewritten to use App Direct mode, but that’s worth doing for analytics and in-memory databases where you can now have massively more memory than you could address before (and again, at a much lower price than DRAM). You can get much lower overheads for I/O-intensive workloads and reduce network traffic by doing away with a lot of storage accesses.
SAP HANA, for example, can move its main data store into persistent memory while the table and working memory set stays in DRAM. Redis, which stores key value pairs, keeps the keys in DRAM but moves the values into persistent memory.
Mixed mode allows the system to use Optane in both memory and app direct mode. There’s also Storage over App Direct mode, which treats Optane as slightly faster storage with higher endurance than an enterprise-class SSD rather than slightly slower memory, using an NVDIMM driver so existing applications can save to it.
That means that if your needs change over time – or if you run a mix of workloads on the same hardware – you can optimize the Optane configuration for a workload like SAP HANA, where the memory capacity has a major impact. When that memory-intensive workload isn’t running, the same system can be optimized for, say, VM density, giving you better utilization of what will be a fairly major investment.
This kind of flexibility will appeal to a wide range of customers, Patrick Moorhead, president and principal analyst at Moore Insights and Strategy, told Data Center Knowledge. “I believe this feature is valuable to both cloud service providers and enterprises, as it enables optimization for the workload but more importantly improves fungibility of the compute fleet. CSPs enable this kind of feature through a brute-force method of moving workloads to a more optimized fleet, but this enables a more elegant solution closer to the metal.”