The sheer amount of data companies now store and want to put to use, be it for training AI models or for heavy-duty analytics, in combination with regulators’ increasing focus on physical data location means they spend more time thinking about where to store data today than they ever have before.
The goal is to strike a balance between ensuring the data is stored cost effectively and storing it in a way that makes it possible to analyze huge swaths of it quickly, using the best available computing resources for the job, all while remaining compliant with regulations both existing and upcoming.
But what if physical location of the data didn’t matter as much? What if companies could store data where it makes the most sense in terms of cost and regulatory compliance without worrying whether they’d be able to crunch through it in real-time?
RStor, a startup created by former data center technology leaders from some of the biggest names in the space – the likes of Facebook, Amazon Web Services, Intel, and EMC – says it’s possible.
Saratoga, California-based RStor came out of stealth today, announcing completion of a Cisco Investments-led $45 million funding round and unveiling a distributed computing platform it says will give users complete freedom to decide where they store data and what computing resources they use to process it, be they Amazon’s cloud servers, their own on-premises compute muscle, or a university supercomputer.
“We don’t believe there’s a single location that has everything that you possibly need and the best possible configuration,” Giovanni Coglitore, RStor founder and CEO, said in an interview with Data Center Knowledge. According to him, the company’s technology “stitches together the world’s cloud service providers and supercomputer centers” via a unique network fabric and identifies the best compute resource for each workload at any given moment, applying that resource to the data at hand, wherever the data may reside.
That’s RStor’s vision, and the startup isn’t sharing much about how its platform works or its early users. But the size of its ambition and the reputation of its leadership team have been enough to attract an unusually large funding round for a data center startup.
Roots in High-End Hardware Design
Coglitore, 50, ran the hardware engineering team at Facebook that designed the first servers the social network open sourced through its Open Compute Project organization, now an independent non-profit that’s become a disruptive force for some of the world’s largest hardware vendors.
While at Facebook, he was also involved in the development of a “cold storage” technology that uses Blu-ray discs to store infrequently accessed user data. That technology became the basis for Optical Archive, a startup Coglitore cofounded and led as CTO until it was acquired by Sony Corporation of America in 2015. He served as CTO of Sony’s optical archive division after the acquisition.
Before Coglitore joined Facebook in 2010, he had been CTO at Rackable Systems, a hardware maker he founded, known for its high-performance computing systems. He left following Rackable’s acquisition of Silicon Graphics, Inc., better known as SGI. Rackable eventually changed its name to SGI and in 2016 sold to Hewlett Packard Enterprise for $285 million.
Eliminating ‘Chatty Overhead’ from the WAN
The technology at the heart of what’s called the RStor Multicloud Platform is the company’s network fabric, which can interconnect gear sitting in data centers thousands of miles away from each other at extremely low latency. RStor’s latency on a roundtrip connection between West Coast and East Coast of the US, for example, is 80 milliseconds (40ms each way), Coglitore said. For comparison, average latency between two AWS availability zones on opposite coasts is in the neighborhood of 60ms one way. It’s the low latency that can make physical distance between data and compute less of a factor in site-selection decisions.
Have data in Ashburn, Virginia, that you’d like to crunch through using an HPC cluster at the San Diego Supercomputer Center? RStor offers two different ways to do this. One is to upload the data to its Data Lake, a distributed storage system hosted in multiple colocation data centers with access to compute resources in supercomputing centers and public clouds.
The other may sound like science fiction. RStor claims it can perform compute on data stored remotely, delivering the kind of performance you normally see when compute and storage are on the same local network. The platform can take computing power of a processor core sitting in a West Coast data center, for example, and use it to crunch through data stored on the opposite coast – without moving the data, Coglitore said.
One of RStor’s customers, a pharmaceutical company he did not name, is using the platform to do just that, he said. The customer’s data being processed never enters the startup’s Data Lake or its network fabric. “We’re actually able to bring the compute slices to them,” Coglitore said.
RStor achieves this by eliminating “the chatty overhead” associated with transporting data over a WAN, according to the company. It manages things responsible for WAN latency, such as congestion and packet loss, in a way that makes the WAN perform like a LAN, utilizing 90 percent of the available bandwidth at all times. The result is the ability to mount storage to compute over extremely long distances.
What makes that possible is the startup’s secret sauce Coglitore and colleagues wouldn’t share much detail about. Jesse Barnes, RStor’s senior director of engineering, said only that the fabric uses a combination of RDMA (remote direct memory access, which is a technique used for one machine to tap into memory of another while circumventing the operating systems) and a number of proprietary protocol extensions the startup developed in-house.
“By using an RDMA technique, we’re able to get the speed of UDP-like protocols but in a much more flexible manner,” Coglitore said. UDP, or User Datagram Protocol, accelerates machine-to-machine data transfer because it doesn’t require the machines to first establish and agree on a communication channel.
Today, the nerve centers of the fabric (which also host RStor’s Data Lake) are Equinix data centers in Ashburn; San Jose, California; and the UK, as well as a CoreSite facility in Los Angeles. Expansion to Hong Kong, Singapore, and Brazil is in the works.
Equinix is both a data center provider and a partner, Coglitore said, but RStore values being vendor-agnostic, and there’s no exclusive agreement with any single data center provider.
Not only is RStor offering the capability to process data located anywhere using compute power located anywhere, it’s also build an analytics engine that automatically determines what compute resource is best to apply to a specific dataset for a specific application at any given moment. It takes into consideration a number of factors, but the main ones are performance and cost.
In combination with the network fabric, this engine makes possible an automated computing spot market of sorts.
Vision and Pedigree
It’s hard to predict at the moment how RStor’s technology will be received in the market, Michael Porat, director at Cisco Investments who leads the Data Center and Storage Domain for the network technology giant’s venture capital unit, told us.
“It’s early days,” he said, but he and his colleagues found the pedigree of the team and the fact that it was trying to address a major pain point for customers convincing enough when they decided to back the startup. “It wasn’t at that point about any one component of the technology; it was more belief in the team and their ability to come up with strong technology,” Porat says.
Barnes, in charge of engineering at RStor, is a veteran Intel engineer. Its VP of strategy and general manager is Tim Harder, who was a founding member of EMC’s technology venture capital arm, and who later spent four years as head of block and file storage services at AWS. In a slide deck, RStor lists Microsoft, Google, VMware, Dropbox, Yahoo, and Samsung as companies other members of its team have worked for in the past.
Porat declined to disclose Cisco’s portion of RStor’s $45 million Series A round. An earlier report that said Cisco had poured $80 million in RStore and was the startup’s sole investor was not true, he said.
The pain point RStor is addressing is managing data location and transitioning to a hybrid-cloud architecture amid an unprecedented data explosion, Porat explained. Where its data resides is a question every enterprise is asking itself, and there’s no simple answer. “It’s fairly obvious that it’s not just in the cloud or just in the [on-premises] data center,” he said. “The premise that everything is going to be in AWS is pretty much outdated.” Neither is it all going to be in private cloud. “That’s just not economically efficient. It hasn’t been solved by anyone.”
RStor is tackling this problem in a way that hasn’t been tried before, and if its technology works as promised, it may change the nature of data center infrastructure as we know it.