Skip navigation
Pivotal Open Sources Entire Big Data Suite
Paul Maritz,CEO, Pivotal, speaking at an industry event. (Source: Pivotal’s Facebook profile)

Pivotal Open Sources Entire Big Data Suite

Partners with Hortonworks, unveils industry group formed around enterprise Hadoop

As enterprise IT grows increasingly comfortable with using open source technologies, presence of enterprise vendors in the open source ecosystem grows too.

Pivotal, the San Francisco-based company majority owned by EMC and VMware, whose mandate is to enable enterprise developers to build and deploy analytics-enabled software using modern agile development methods, has led three open source projects: the well-known Platform-as-a-Service software Cloud Foundry; enterprise messaging software RabbitMQ; and Redis, an in-memory key-value store.

Now, the company has decided to open source the remaining components of its suite of big data analytics software. Pivotal announced the decision Tuesday, the day it also announced a partnership with enterprise Hadoop company Hortonworks and formation of a new industry association around open source big data technologies.

The components of Pivotal’s Big Data Suite being open sourced are its Hadoop distribution, called Pivotal HD; its massively parallel processing SQL engine called HAWQ; Greenplum Database; and GemFire, its in-memory NoSQL database.

The main reason to open source the entire suite is simply to give enterprise customers what they want, Sundeep Madra, vice president of Pivotal’s Data Products Group, said. Traditionally averse to open source, enterprises are increasingly realizing that there are some big benefits to the approach. Those benefits are preventing single-vendor lock-in and being involved in shaping the technology they use. “They want to impact the roadmaps,” he said.

Open Source With Paid Premium Features

Pivotal is taking the common approach to making money on open source software: open the code but sell a commercial distribution with advanced features. Those include things like WAN replication in GemFire, which is a way to have a fully replicated database at a remote data center, ready to be queried without delays when the primary data center is having problems. Another advanced GemFire feature is continuous query, which continuously retrieves results of a SQL-type query as they become available. In a trading application, for example, a query may be used to display stocks over a certain price point. As more stocks cross the price point during the day, they will be retrieved for the application’s use. The open source version of HAWQ will be fully functional but will not include next-generation query optimization and management tools.

Pivotal has not yet worked out exactly which components will be open source and which will be kept behind the paywall. “We’re not necessarily finalized on the list,” Madra said.

A Common Core for Hadoop

The new industry association, called Open Data Platform, will promote big data technology based on open source software in the Apache Hadoop ecosystem. The consortium has a hefty list of founding members, consisting of major vendors and service providers. In addition to Pivotal, it includes EMC, VMware, GE (which owns a minority stake in Pivotal), IBM, Teradata, Verizon, CenturyLink, Capgemini, Splunk, Hortonworks, and AltiScale.

First order of business for ODP will be creation of a tested core reference platform of Hadoop and Apache Ambari, the open source software for provisioning, managing, and monitoring Hadoop clusters. Member companies will build offerings based on the common core. The point is to create a standard platform to simplify and accelerate integration of applications and tools with Hadoop. Instead of testing different tools by different vendors for compatibility on their own, users will simply know that the tools will work on any system that’s compliant with the standard.

The goal of ODP is to avoid segmentation in the Hadoop ecosystem, Madra explained. ODP wants to do for Hadoop what creation of the Linux kernel did for the Unix ecosystem, which was getting fragmented. “That’s what we want to do with the Open Data Platform,” he said. “We think this will be a really big advancement.”

Linking Up with Hortonworks

Pivotal is joining forces with Hortonworks, the Yahoo spinoff that last November became the first enterprise Hadoop company to file for an initial public offering, to integrate products and to collaborate on engineering and support. Pivotal plans to enable its Big Data Suite to run on Hortonworks’ platform, but it will continue supporting its own Hadoop distribution, Madra said.

Notably, Cloudera and MapR, the other two major enterprise Hadoop players, were not part of the ODP announcement. Madra said he would love for the two Hortonworks competitors to be part of the group.

Customers Set Tone in Hadoop Ecosystem

Hadoop, the open source framework for storing across clusters of commodity servers for parallel processing, is enjoying widespread use in the enterprise nowadays. Mike Hoskins, CTO at Actian, which provides a SQL analytics solution based on Hadoop, said nearly every company has a Hadoop cluster, and those are no longer experimental deployments. “It’s hard to find a major account that doesn’t have a deep, serious project and investment in Hadoop clusters,” he said.

Customers are driving technology roadmaps more than ever before, he added. “People don’t appreciate to what degree the power pendulum has swung from the vendor to the customer. Customers are now more empowered to set the ground rules.”

Perhaps this explains Pivotal’s decision to let users play a bigger role in further development of its “crown jewels” through participation in the open source development process. Another possible explanation is wanting to focus on higher-level tooling, while letting go of the control over lower-level infrastructure components, Hoskins said. “The value is all higher in the stack.”

TAGS: DevOps
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish