Hortonworks’ Hadoop Data Platform (HDP) was made available and supported on the Google Cloud Platform late last week in a major development for cloud Hadoop. Engineers from both companies have collaborated to make it easier to provision HDP clusters on Google’s cloud.
Highlights of the engineering work include integrating “bdutil”(command-line script used to manage Apache Hadoop instances on Google Compute Engine) with the Apache Ambari (Hadoop management project) plugin to provision and manage infrastructure, and a Google Cloud Storage connector for HDP, Ajay Singh, director of technical channels at Hortonworks, wrote in a blog post about the announcement.
The companies have made source code for the integration available for use and open contribution on GitHub.
Created by Yahoo, Hortonworks was one of the first companies to go after the enterprise Hadoop market, turning the open source technology into a software business. The framework enables users to turn cheap commodity servers into powerful compute clusters that can crunch through a lot of data using parallel processing techniques. This partnership means it’s easy to use Google’s cloud servers in doing so.
Open source Hadoop forms the foundation of several companies, and Hortonworks is one of the leaders in the space. Other major players include Cloudera and MapR.
Cloudera and Google recently partnered to enable Google’s Dataflow system on Spark, a stream-processing framework for real-time big data analytics.
Google has an interesting history when it comes to Hadoop. Last July, Google said it stopped using MapReduce, the model it itself created that served the basis for Hadoop. This did not affect Hadoop’s momentum, however, as seen by the continuing interest and investment going into companies in the space.
Its cloud Hadoop partnership with Google speaks to several trends: big data functions are moving to the cloud because of economics and flexibility, and enterprises are embracing open source technologies for big data.
Other big cloud providers, such as Amazon Web Service, have a variety of easily digestible Hadoop setups. Amazon’s Elastic Map Reduce (EMR) is a managed service that provides the Hadoop framework on EC2. MapR's platform is also available on EMR.
There are also startups like Xplenty providing easy-to-use Hadoop on AWS. It’s possible to deploy Hortonworks on AWS as well.
“With Google Cloud Platform and Hortonworks Data Platform, enterprises benefit from limitless scalability and an enterprise-grade platform backed by community driven open source innovation,” Singh wrote.