Databricks Unveils Spark as a Cloud Service

Databricks, founded by the creators of Apache Spark, recently raised a $33 million Series B and revealed a new cloud-based platform built around the open source cluster computing framework. The company also announced a couple of important partnerships with SAP and DataStax, bringing its Apache Spark 1.0 distribution to SAP’s HANA real-time analytics platform as well as improving compatibility with the Cassandra database.

Databricks offers a fully managed service for the open source framework. It is an integrated cloud platform that provides easy Big Data analytics and processing.

Spark is an alternative to MapReduce – the technology that Google recently said it was done using. Hadoop, which was originally inseparable with MapReduce, is very much alive and thriving, however, and Spark is one of the most promising technologies in the ecosystem.

Along with announcing that it no longer uses MapReduce, Google introduced a new cloud service, called Cloud Dataflow, which combines batch analytics with streaming analytics. Databricks' new cloud service will be competing with Dataflow.

Spark allows in-memory analytics, which is much faster than MapReduce and enables stream analytics. With more than 200 contributors, it’s one of the most active projects in the Hadoop ecosystem.

Spark in the cloud

The Databricks cloud platform is a turnkey solution that brings Spark to a wider audience. It helps companies provision a Spark cluster easily, with the platform handling all the details: provisioning servers on the fly, streamlining import and caching of data, handling all elements of security and continually patching and updating Spark.

In addition to letting users deploy and leverage the rapidly growing ecosystem of third-party Spark applications, the Databricks cloud comes with a set of built-in applications which help customers access and analyze data faster.

“One of the common complaints we heard from enterprise users was that Big Data is not a single analysis; a true pipeline needs to combine data storage, ETL, data exploration, dashboards and reporting, advanced analytics and creation of data products. Doing that with today’s technology is incredibly difficult,” said Databricks founder and CEO Ion Stoica. “We built Databricks Cloud to enable the creation of end-to-end pipelines out of the box while supporting the full spectrum of Spark applications for enhanced and additional functionality.”

Spark provides support for interactive queries (Spark SQL), streaming data (Spark Streaming), machine learning (MLlib) and graph computation (GraphX) natively with a single API across the entire pipeline.

The Databricks cloud is currently in limited availability.

A deeper war chest

The Series B funding round in late June was led by New Enterprise Associates (NEA) with follow-on investment from Andreessen Horowitz. Both firms are active investors in the Big Data space.

Comments

Plain text