You will see much more of this word from now on: decoupling. Data center architects familiar with software-defined infrastructure will recognize it as referring to separating the resource requirements of software from the hardware that hosts it. Software-defined networking architects accomplish this by partitioning the data that comprises network traffic separately from the instructions that control and manipulate it — the data plane and control plane, respectively.
Datera is not the first company to attempt to apply SDN principles to data storage. As it emerged from stealth mode on Tuesday, Datera introduced its key product: a software-defined data storage platform called Elastic Data Fabric. Think of EDF as a kind of “data stack” that provides capacity to applications on an as-needed basis, endeavoring to provide flexibility and responsiveness similar to Amazon’s Elastic Block Storage but with on-premises, commodity x86 hardware.
As you’ve already surmised, Datera’s methodology is based on decoupling software from infrastructure. The ideal implied here is that any application that consumes data should not be responsible for determining the design of the infrastructure that supports it — especially as rapidly as the demands of all modern applications change.
Pomp and Circumstance
The EDF product is commercially ready now, with Datera boasting of four customers having successfully deployed it during the company’s stealth period: logic and IC engineering firm Cadence Design Systems, block storage service provider Packet, IT consultant Schuberg Philis, and Germany-based OpenStack hosting firm Teuto.net. Its financial backing, the company announced Tuesday, includes a $40 million round led by Sun Microsystems co-founder Andy Bechtolsheim, Juniper Networks vice chairman and CTO Pradeep Sindhu, plus backing from Khosla Ventures and Samsung Ventures.
If that’s not reputation enough, Datera’s founding CEO is Marc Fleischmann. Run a search for “decoupling” in the U.S. Patent Office database, and you’re likely to run into his name.
In 2001, as an engineer for a firm whose names IT veterans will recall — Transmeta Corporation — Fleischmann led the design of a kind of “dimmer switch” for CPUs. This switching mechanism enabled Transmeta’s Crusoe line of processors to limit power consumption during low production cycles. It was decoupling, just at another level. And it sent competitor Intel back to the drawing board, changing how its engineers perceived power, and helping to trigger the “Tick-Tock” process revolution that remade the company. (Transmeta, you’ll recall, was the employer of one Linus Torvalds.)
In 2008, Fleischmann co-founded RisingTide Systems, along with Linux SCSI project maintainer Nicholas Bellinger and Transmeta veteran Claudio Fleiner. It’s the heart of RisingTide that descended into stealth for a bit, to re-emerge this week as Datera. You could say, a huge chunk of Linux’ heart and soul is responsible for powering this new storage venture.
“The goal of RisingTide was to turn Linux into a viable storage operating system,” wrote Fleischmann in a note to Datacenter Knowledge, “and achieve wide distribution for our block storage stack (LIO) to make our stack the industry standard. By 2013, we had achieved our goals, and helped storage startups, including Pure Storage, to become very successful.
“However, while the industry took great advantage of our open source stack to replace proprietary hardware with less-proprietary hardware (and called it ‘software-defined’),” Fleischmann continued, “we think they missed the point. Mapping the rigid infrastructure of the past into software is simply not enough to deliver the DevOps-centric infrastructure operations model of the future. We imagined a fundamentally different, continuous delivery model for storage, to create a modern, agile data fabric for enterprises and service providers building cloud data centers.”
This Time, Floating All Boats
Fleischmann’s reputation, or those of his colleagues or financial backers, may not matter nearly as much to data center architects today as this future operations model which the CEO mentions. To the point, all the stagecraft and marketing wizardry may pale in comparison to the question of whether Datera can scale.
In today’s hyperscale data centers, where the demands of applications are shifting and adjusting rapidly, configuration management platforms from the 2000s are failing. Decoupling is becoming vitally necessary here, especially now that configuration scripts simply cannot be attached to an infrastructure schematic that can no longer be pinned down from day to day.
Now that it’s in the public eye, Datera must quickly work to distinguish itself in a market where “decoupling,” “software-defined,” and “Amazon-like” are skirmishing with each other for the top of the buzzword list. To that end, Fleischmann told us, he aims for Datera to bring to storage infrastructure the counterpart of what containerization and orchestration have brought to workload infrastructure.
When RisingTide tried this path once before, it met fierce competition — and, in some circles, outright opposition — from Red Hat, which acquired SDS pioneer Gluster back in 2011 and established itself as a dominant force in that market space. How will Datera differentiate itself with this go-round?
“Red Hat indeed uses our open source block storage software stack LIO successfully, together with other open source software, to create software-defined storage products,” responded Datera CEO Fleischmann. “However, all of these products require extensive manual configuration and substantial support to deploy and operate them. This is by design, as open-source software companies rely on a services and support business model.”
Here is where Fleischmann’s newly remade company picks up one of the mantles from SDN: intent-based configuration.
Inside Datera’s Intent
Datera gave Datacenter Knowledge a look at its API User’s Guide, which completely reveals the company’s EDF management information model (MIM). As with all modern APIs, this one is entirely programmable using procedure calls placed over HTTP protocol.
In EDF’s model, there are three principal constructs that relate to the storage process: applications, storage nodes, and volumes. Both classes of construct are divided into two classes of object: one for the construct in general, the other for each specific instance (“an application instance,” for example, as opposed to “the application”). Contrary to reason, the way EDF breaks down its devices, a volume descriptor is contained by the storage node category, and a storage node descriptor by the application category. This corresponds with what’s truly important in this system: The application (or, more accurately, the orchestrator making space and time for the application) sets the rules for how resources are to be utilized, and whose user access policies apply to them.
Rather than presenting Datera with a colossal manifest detailing the present state of the data center, EDF listens for API calls. Each of these calls specifies some part of the intent of the object — its “expectations” for how it will utilize resources. EDF responds by adjusting the MIM, which represents the “live” configuration for the entire data storage space.
“We take the complexity out of storage operations,” wrote Marc Fleischmann. “Users can simply define the goals they desire for their applications (in ‘application intents’), and let our intelligent software do the rest, instantly, automatically, and at any scale.”
With Docker, a script can be written to represent the intent of an object, one example being a storage volume. This is especially important for Docker, whose containers are designed to be “ephemeral,” and whose data was originally not supposed to survive the termination of the container (this “stateless” model has since been intentionally circumvented in many ways).
As the company’s containers solution brief [PDF] shows, an intent template for a volume may specify the maximum number of I/O events per second, or the bandwidth in MB/sec, or the maximum number of replicas to be allowed — elements of the volume’s policy. Since this template is just text, it’s conceivable that a UI can be readily created to compose such a template on the spot. The operator then invokes the Datera volume driver from the Docker command line, and passes the template through that command.
Datera EDF is not a data lake. However, that doesn’t mean a customer could not use it to build one, said Fleischmann.
“Our elastic data fabric is designed for both modern data lakes and more traditional storage use models,” the CEO told us. “We contributed our block storage stack LIO to Linux to make it an industry standard connector, behind which we can build an elastic data fabric to unify such a wide spectrum of use cases. We have more storage protocols on our roadmap, to make our data fabric very broadly usable. On top of it, we built a powerful policy-based management plane, to allow an equally broad spectrum of applications to automatically consume the data fabric, while we automatically configure all of its elements and continuously optimize them.”
It’s an effort to apply two of the SDN field’s most compelling concepts towards