IBM has made a huge commitment to what it called the most significant open source project of the next decade, Apache Spark. Spark is an open source processing engine built around speed, ease of use, and sophisticated analytics. IBM is donating its IBM SystemML machine learning technology to the Spark ecosystem, incorporating Spark extensively in its offerings as well as committing significant resources to Spark-related projects.
IBM sees Apache Spark as the analytics operating system of the future, and is investing to grow nascent Spark into a mature platform, according to Joel Horwitz director of Portfolio Marketing, IBM. IBM is dedicating 3,500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide. IBM also hopes to educate more than one million data scientists and data engineers on Spark.
Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. IBM’s donated technology advances machine learning in Spark, while the rest of the commitment advances Spark as a whole.
“IBM is investing in the Apache Spark core processing technology because the market of intelligent applications represents a huge opportunity, ranging from the Internet of Things (IoT) to the digital, connected, and social needs that are transforming businesses everywhere,” said Horwitz
IBM will also embed Spark into its analytics and commerce platforms, and offer Spark as a Service on Bluemix. The addition of Spark as a Service on IBM PaaS Bluemix enables developers to quickly load data, model it, and derive the predictive artifact to use in intelligent applications.
Spark is considered an alternative to MapReduce, a technology that suffered a blow last year when Google said MapReduce was no longer sufficient for its needs. The Google news was falsely viewed as a blow to Hadoop, as MapReduce and Hadoop were originally inseparable. Hadoop is very much alive and thriving and Spark is one of the most promising technologies in the ecosystem, with potential to supplant the more complex MapReduce.
The battle isn’t just Spark versus MapReduce. There are also other solutions like Apache Tez and Apache Flink in the mix. However, Spark is undergoing a meteoric rise.
The creators of Spark formed Databricks, a company that offers a cloud-based platform based around the open source cluster computing framework. IBM also announced it will collaborate with Databricks to advance Spark’s machine learning capabilities.
IBM isn’t the only big technology company that sees the Spark potential. Dell and Cloudera brought a Spark-Powered in-memory processing appliance to the market last year, the now Cisco-owned Piston Cloud Computing expanded its support from private OpenStack to Spark and others earlier this year. MapR supports Spark and Rackspace offers Spark on bare metal, Many others have gone to market with Spark-focused offerings.
“We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, General Manager, Analytics Platform, IBM Analytics. “Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation.”
IBM has partnerships with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC to educate data folks on Spark.