Phil White is CTO of Scale Computing.
The tech industry is holding on to their collective hats as it was revealed in early 2018 that nearly every computer chip manufactured in the last 20 years contains major security flaws, with certain variations on these flaws called Spectre and Meltdown. These flawed chips run all the essential processes on your computer, handling extremely sensitive data including passwords and encryption keys, which are the fundamental tools for keeping your computer secure. Spectre and Meltdown weaken the isolation between these processes, potentially allowing them to steal information from each other. With such a significant security hole, IT administrators may be forced to do an upgrade on a system they hadn't previously planned to do.
As a designer and integrator of software and hardware systems, here are three things to keep in mind about the Meltdown and Spectre patches:
Full Extent of Performance Hits May Vary Dramatically
Impacts are going to be seen across the industry. Depending on the application, initial testing and benchmarking has shown anywhere from negligible impacts ( < 1 percent) to catastrophic (800 percent). As it relates to hyperconverged infrastructure (HCI) and software-defined storage, a well-designed and efficient architecture with plenty of CPU to spare will be your best friend. Remember, in a hyperconverged system the virtual storage controller shares the same CPU resources as the primary workload. Each IO which traverses that path likely invokes many syscalls, which are the achilles heel of the Meltdown and Spectre mitigations. In the wake of those mitigations, storage stack designers and integrators must work to reduce or eliminate these expensive syscalls, as they have now suddenly become much, much more expensive. Depending on which software vendor you talk to, and the level of optimization they have already done to date, there could either be sheer panic, or clear skies and smooth sailing.
Come see DCK editor Yevgeniy Sverdlik interview Scale Computing co-founder Jason Collier (colleague of this article's author) live on state at Data Center World this March in San Antonio, Texas.
Don’t Run a System at or Near Full CPU Capacity
As a system designer or integrator, it is critical to build-in sufficient CPU headroom into any system. Customers who have been sold underpowered systems may find themselves in serious trouble once these mitigations have been enabled, and in some cases may be forced to run without them if the systems become overloaded. Additionally, if high availability is a requirement for any system, extra CPU headroom should already exist; the entire software workload must be able to run on a subset of the CPUs (total CPUs minus any tolerated failure). These thresholds will need to be re-evaluated post-Meltdown/Spectre.
There May Be More To Come
Meltdown and Spectre have introduced us to a new class of architectural vulnerabilities. These are clearly not easily fixed, and the fixes thus far have been handled clumsily in some cases. Now that researchers (both good and evil) have their eyes on these vulnerabilities, it’s entirely possible we will see additional similar vulnerabilities appear in the future. The industry should prepare itself for this by adjusting processes to allow for the rapid deployment of critical fixes at the firmware and microcode level. Cross-vendor lines of communication should also be opened, as it is critical to share detailed information regarding vulnerabilities such as Meltdown and Spectre as soon as they are discovered.
Let’s all learn from Meltdown and Spectre, and improve the industry as a result.
Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Informa.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating.