Despite the way SSD prices have fallen, the all-flash data center continues to be something of a rarity - although Gartner has predicted that up to a quarter of data centers might use all-flash arrays for primary storage by 2020, with ROI of under six months in some cases. By then, you’ll be using NVMe protocols over PCIe rather than SAS or SATA to connect those to the server or storage array, or maybe an NVMe over Fabric (NVMf) architecture using RDMA or Fiber Channel. You might also be considering next-generation non-volatile storage like Intel’s Optane.
Rather than replacing hard drives directly, these low latency, low-power technologies are likely to give you a high-performance storage tier for your most demanding workloads.
Flash’s smaller size and lower power and cooling requirements looked like the perfect fit for data centers limited by space and power limits. With fewer moving parts, it’s not as susceptible to shock and vibration, either; while humidity turns out to be the cause of most disk drive failures in hyperscale cloud data centers, vibration is as much of a problem in enterprise data centers. That also makes SSDs an attractive option in less traditional data center locations, like oil rigs and container ships where hard drives would struggle.
But it wasn’t just the higher price of SSDs that held back adoption, especially as flash arrays use data deduplication and compression to keep their raw media costs down. Taking advantage of the higher performance of flash for the applications that use that storage means a lot of integration work. You might be able to remove existing layers of overprovisioning and caching to reduce costs when you adopt flash, but that means changes to your data center layout. And with faster storage performance, it’s easy for networking to become the bottleneck.
Plus, the SATA and SAS protocols used to connect storage were designed for tapes and hard drives that can’t handle the large numbers of simultaneous I/O requests that flash can. As flash capacity increases, connecting SSDs using protocols that only allow for a limited number of storage request queues becomes increasingly inefficient, which is why the storage industry is currently switching over to NVMe and PCIe to get much higher IOPS and throughput.
“A huge difference between NVMe and SCSI is the amount of parallelism that NVMe enables, which means that there will potentially be a huge bandwidth difference between NVMe and SCSI devices (in densely consolidated storage environments, for example),” IDC Research Director for Enterprise Storage Eric Burgener told Data Center Knowledge. “NVMe is built specifically for flash and doesn’t have anything in it to deal with spinning disk, so it’s much more efficient (which means you get a lot more out of your storage resources). Latencies are lower as well, but that difference (around 200 microseconds faster than 12Gb SAS devices) may not make that much of an impact because of other bottlenecks.”
So far, most NVMe SSDs have been directly attached to servers. “At this point, 99 percent of all the NVMe devices being purchased are bought after market by customers who put them into x86 servers they own that have a lot of PCIe slots,” says Burgener.
A number of storage vendors are already using some NVMe technology in their arrays, for cache cards and array backplanes, and marking their existing all-flash arrays as “NVMe ready”. Pure Storage had already put NVMe cache cards (customized to be hot pluggable) in their FlashArray//M, and now has announced the FlashArray//X, using NVMe throughout the array - for the devices, controller and backplane, and with a dedicated 50 Gb/S RDMA over Converged Ethernet (RoCE) NVMe Fabric. Pure has already demonstrated that fabric working with Cisco UCS servers and virtual interface cards, and Micron recently announced its SolidScale platform using a Mellanox RoCE NVMe Fabric.
“I think over the next three years we will see more array vendors using NVMe cache cards, NVMe backplanes, NVMe controllers, NVMe over fabric, and then finally all NVMe devices,” he predicts. Vendors who already have software to manage tiered data placement in hybrid flash arrays (like Dell EMC, HDS, HPE, IBM, and NetApp) might also introduce multi-tier all-flash arrays, using a small NVMe cache with SAS SSDs; these would be cheaper but more complex.
Optane - the brand Intel uses for the 3D XPoint persistent memory it developed with Micron when it’s packaged as storage rather than memory - will also start out as direct-attached SSDs using PCIe, and move into arrays as capacities and volumes increase. In early tests, Optane is comparable to fast SSDs on throughput and latency but much better at sustaining those under load – even with a large number of writes being performed, Intel claims the read latency stays low. The write endurance is also far better than NAND flash, and Optane can read and write individual bytes, rather than the pages of flash and the sectors of a hard drive.
“Having Optane SSDs as the caching layer within a high performance storage system will enable infrastructure teams to move even the most demanding OLTP database workloads onto a simplified shared storage pool, without having to worry about these applications being disrupted by other data center services,” claims James Myers, director of data center NVM Solutions Architecture at Intel.
Initially you’ll either treat Optane like an SSD or use it as slightly slower DRAM. You can to use Optane SSDs in Storage Spaces Direct in Windows Server 2016 and Azure Stack as well as VMware vSAN, and the next release of Windows Server will support it as storage class memory.
But in time it will show up as something between storage and memory. “Instead of doing I/O (reads and writes of 4K blocks for example) we’ll be doing loads and stores on bytes of data on our new byte-addressable persistent memory,” explains Alex McDonald from the Storage Networking Industry Association Europe. “We need a new programming model for that, because it’s not like regular DRAM. You can’t clear it by removing the power, for example, so clearing problems using the big red switch won’t work. That’s just one consideration of something that’s nearly as fast as DRAM but doesn’t drop bits when the power goes off.”
The Need for Speed
Before you invest in a new storage tier, you need to understand your workloads. Are writes more important than reads? Is latency an issue? Making the wrong choice can significantly impact application performance. As Brian Bulkowski, co-founder and CTO at database company Aerospike, notes “transactional systems are becoming more and more complex; they now process a high volume of data while a transaction is happening”.
You also need to make sure that your applications can take advantage of better storage performance. In-memory databases, for example, are designed on the assumption that storage is slow and likely to be distributed across different systems.
“There are very few workloads that today require an end-to-end NVMe system, with an NVMe host connection, NVMe controllers and backplane, plus NVMe devices,” says Burgener. There are correspondingly few vendors selling them (like Apeiron Data, E8 Storage, Excelero and Pavilion Data).
“Customers that have bought these systems tend to use them for extremely high-performance databases and real-time big data analytics where a lot of the data services needed by an enterprise (RAID, compression, encryption, snapshots, replication, and so on) are provided by the database or are not used,” he explains. Where those data services are included, he finds “they lack maturity and broad features,” making these systems more of a niche play for at least the next three years (although overall market revenues for these systems will grow every year).
One exception is Pure Storage, which has been offering what Burgener calls “the full complement of enterprise-class functionality” on its arrays for a number of years. The all-NVMe FlashArray//X is Pure’s highest-end array and other models still use SAS rather than NVMe. “Most customers probably won’t need the performance of the X70 for quite a while,” he notes, but when they do, Pure may have the advantage of a more mature offering.
Bulkowski suggests there’s “a bifurcation in the flash market” due to vendors prioritizing the read-optimized, cheaper, slower, high density MLC flash that hyperscale customers want, over the faster, low density, write-optimized SLC flash. He expects Optane, with its higher performance, to eventually replace SLC (which is dropping out of the market), though it does require rethinking how your software works with memory; for example using Optane to store indexes with other, slower, storage handling the bulk of your data.
Some Aerospike customers are using the database for workloads that will be able to take advantage of faster storage, like web advertising (where low latency is crucial), as well as for fraud detection, threat prevention and network optimization in telecoms. IoT and machine learning will also drive some demand for extremely high write throughput, as well as large data sets. An enterprise hybrid transaction/analytical processing system might have a 10TB in-memory database; a machine learning data set for fraud detection would be closer to 100TB.
At the other end of the scale, as Burgener points out, “for handling mainstream workloads, there is a lot of room for growth still with SAS-based SSDs and arrays.” The full range of NVMe standards for dual port, hot plug and device drivers are still in development and he suggests “there will likely still be some shifting there before those standards settle. Those features are needed for widespread enterprise use, and SCSI is already very mature in those areas.”
Between the initially high cost of NVMe and the need for standards to mature and volumes increase (bringing those costs down), IDC predicts that NVMe will only replace SATA and SAS in mainstream systems by 2021. Optane doesn’t ship until Q3 this year and may not ship in volume until 2018.
“It’s an interesting product, it pushes performance limits, and it will enable the development of more real-time workloads, but volumes will be limited for the next several years.”
“Optane will be sparingly deployed in server-based storage designs (putting Optane devices into x86 servers) and won’t start to appear in any arrays until the end of 2018 at the earliest,” Burgener predicts; “It’s just not needed for most workloads. Even at that point it may be used for cache cards and possibly a small high-performance tier to front end slower SSDs: HPE plans to use Optane as a caching layer in its 3PAR SANs.” It’s also going to be very expensive relative to SAS SSDs for probably at least the next few years. In-memory databases are one area where Optane will first be used.” That’s going to offer a boost for machine learning too.
If you do have a workload that needs this faster storage tier, Burgener cautions that you’ll also need to look at your network architecture. One NVMe device will max out a 40GbE host connection, and a single Optane device can max out four 100GbE connections. “Unless you increase the host connection bandwidth significantly, you leave most of the performance of the array inaccessible to the application workloads,” he warns. That means to get the full performance of NVMe and Optane arrays, you’ll need to move to NVMe over Fabric – which again means changes to your network infrastructure.
You may also need to design your storage tiers to extend beyond the traditional data center. Eventually, McDonald predicts, “this stuff [will be] so fast (and hopefully cheap relative to its performance and capacity) that it will be pretty ubiquitous as the compute storage layer.
In other words, your data will be in one of three categories; super-hot and on something like these technologies; super-cold and being stored on much slower media; or super-distant, being generated on peripheral and the edge by remote devices (think IoT, sensors, smart city, retail outlets and so on) that will use cheap flash.” Your storage tiers could extend from super-fast persistent memory for the hottest data that’s being processed all the way to cloud storage, but making that work will need a smart, automated data fabric to make it seamless.