MapR, a Google Capital-backed Hadoop distribution provider, announced its software now supports Apache Drill, an open source stream data processing software framework the company is deeply involved in developing.
Support for Drill 0.5 comes as part of the latest release of MapR’s software, MapR 4.0.1. The release also includes support for updated Apache Spark and HBase and uses Hadoop 2.4, including YARN.
Drill is a framework for distributed applications that analyze large datasets. It is an open source version of Dremel, a system Google built for itself and today provides as BigQuery, a service available through its cloud platform. According to the project’s Wiki page, Drill supports a broader range of query languages, data formats and sources.
Support for Drill brings SQL to MapR’s Hadoop distro, meaning users can run SQL queries against data stored on Hadoop clusters. According to MapR, Drill enables querying of complex data in native formats, including schema-less data, nested data or data with quickly changing schemas, with little involvement from the IT team.
Six of the eight people listed as core developers on the Drill Wiki page are MapR employees, including the company’s co-founder and CTO MC Srivas. “We are kind of the lead drivers, in terms of the committers [to the open source code base],” MapR Chief Marketing Officer Jack Norris said.
But the Drill developer community extends far beyond the MapR team. There are more than 40 contributors to the project total, Norris said.
“It’s an actual formal Apache Software project, incubated within the Apache Software Foundation and really started there, so the design and the APIs, and basically the architecture, had complete exposure to the open source community,” he said.
Some of of Drill’s design goals are to process “petabytes of data and trillions of records in seconds” and scale to more than 10,000 servers. According to Norris, these goals have been addressed in the current architecture of Drill, but indicated it was still early to offer any solid proof that it can meet them, since the version that's out now is 0.5 and not exactly a solid 1.0 general availability release.
With support for Hadoop as well as Drill and Spark (another stream processing framework), MapR has a wide range of data analytics capabilities, providing a variety of tools to choose from. The distribution now includes several batch processing frameworks, five SQL-on-Hadoop technologies, two NoSQL technologies and three machine-learning and graph libraries.
In June, MapR landed a $110 million financing round, which included an $80 million equity investment by Google Capital.