A young woman walks past the IBM logo at the 2009 CeBIT technology trade fair in Hanover, Germany. (Photo by Sean Gallup/Getty Images)

IBM Makes Huge Apache Spark Commitment

IBM has made a huge commitment to what it called the most significant open source project of the next decade, Apache Spark. Spark is an open source processing engine built around speed, ease of use, and sophisticated analytics. IBM is donating its IBM SystemML machine learning technology to the Spark ecosystem, incorporating Spark extensively in its offerings as well as committing significant resources to Spark-related projects.

IBM sees Apache Spark as the analytics operating system of the future, and is investing to grow nascent Spark into a mature platform, according to Joel Horwitz director of Portfolio Marketing, IBM. IBM is dedicating 3,500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide. IBM also hopes to educate more than one million data scientists and data engineers on Spark.

Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. IBM’s donated technology advances machine learning in Spark, while the rest of the commitment advances Spark as a whole.

“IBM is investing in the Apache Spark core processing technology because the market of intelligent applications represents a huge opportunity, ranging from the Internet of Things (IoT) to the digital, connected, and social needs that are transforming businesses everywhere,” said Horwitz

IBM will also embed Spark into its analytics and commerce platforms, and offer Spark as a Service on Bluemix. The addition of Spark as a Service on IBM PaaS Bluemix enables developers to quickly load data, model it, and derive the predictive artifact to use in intelligent applications.

Spark is considered an alternative to MapReduce, a technology that suffered a blow last year when Google said MapReduce was no longer sufficient for its needs. The Google news was falsely viewed as a blow to Hadoop, as MapReduce and Hadoop were originally inseparable. Hadoop is very much alive and thriving and Spark is one of the most promising technologies in the ecosystem, with potential to supplant the more complex MapReduce.

The battle isn’t just Spark versus MapReduce. There are also other solutions like Apache Tez and Apache Flink in the mix. However, Spark is undergoing a meteoric rise.

The creators of Spark formed Databricks, a company that offers a cloud-based platform based around the open source cluster computing framework. IBM also announced it will collaborate with Databricks to advance Spark’s machine learning capabilities.

IBM isn’t the only big technology company that sees the Spark potential. Dell and Cloudera brought a Spark-Powered in-memory processing appliance to the market last year, the now Cisco-owned Piston Cloud Computing expanded its support from private OpenStack to Spark and others earlier this year. MapR supports Spark and Rackspace offers Spark on bare metal, Many others have gone to market with Spark-focused offerings.

“We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, General Manager, Analytics Platform, IBM Analytics. “Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation.”

IBM has partnerships with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC to educate data folks on Spark.

 

Get Daily Email News from DCK!
Subscribe now and get our special report, "The World's Most Unique Data Centers."

Enter your email to receive messages about offerings by Penton, its brands, affiliates and/or third-party partners, consistent with Penton's Privacy Policy.

About the Author

Jason Verge is an Editor/Industry Analyst on the Data Center Knowledge team with a strong background in the data center and Web hosting industries. In the past he’s covered all things Internet Infrastructure, including cloud (IaaS, PaaS and SaaS), mass market hosting, managed hosting, enterprise IT spending trends and M&A. He writes about a range of topics at DCK, with an emphasis on cloud hosting.

Add Your Comments

  • (will not be published)