Hortonworks Teams With Others on Hadoop Data Governance Framework

Other founding members of new Data Governance Initiative are Aetna, Merck, Target, and SAS

Jason Verge

February 13, 2015

3 Min Read
Hortonworks founders
All members of Hortonworks founding team used to work on Hadoop and MapReduce development and deployment at Yahoo prior to 2011.Hortonworks)

Hortonworks believes the only way to properly make Hadoop enterprise-hardened is through open source. The company recently partnered with a handful of others to form the Data Governance Initiative (DGI), which will tackle data governance, a much needed piece of the puzzle in enterprise Hadoop adoption.

The consortium will ensure Hadoop data systems meet enterprise requirements for data governance. In addition to Hortonworks, the founding members of DGI are Aetna, Merck, Target, and Hortonworks’ technology partner SAS. However, success will be driven by the larger open source community.

The goal is to open source the new framework through the Apache Software Foundation and continue to build it out with the full support of the open source community.

DGI will work with the open source community to deliver a comprehensive solution, including metadata services, deep audit store, and an advanced policy rules engine. It will have deep integration with with Apache Falcon for data lifecycle management and Apache Ranger for global security policies.

“The Data Governance Initiative has the potential to supercharge Hadoop innovation, because industry leaders actually participate in open source development,” said Mike Gualtieri, principal analyst at Forrester Research. “If successful, that should result in a razor-sharp prioritization of data governance features that enterprises need the most for their Hadoop implementations.”

Hortonworks President Herb Cunitz stressed the importance of open source as not only a business model, but a development model.

“It de-risks Hadoop for the entire industry,” he said. “Customers look at it and say, ‘You’re driving innovation, but if you ever take too much power, I can go back to the open source.’ It’s a more equitable balance of power. More companies join in and make that flywheel spin even faster.”

In his view, the big question now is not if Hadoop will succeed, but how it will succeed.

“No-one debates whether [Hadoop] is the storage layer for big data; it’s become that," he said. “That being the case, two things need to happen: we need to make sure Hadoop is enterprise-grade and to do it all in open source for rapid ecosystem and customer development support.”

The enterprise Hadoop market consists of a few "pure-play" vendors, offering a variety of flavors, and numerous others providing services and products built on top of Hadoop. The big three pure-play ones are Hortonworks, Cloudera and MapR, but there are many others.

One big prediction for 2015 is that there will be fewer pure-play Hadoop vendors. Hadoop will continue to be massively successful, but consolidation and exits among the multiple players is inevitable. Vendors will abandon the “flavor” approach (Intel being a recent example) and open source will continue to fuel innovation.

This is something Cunitz suggested to look out for in 2015. Will other companies align? Break apart? Or will there be one common way of doing it?

“What we’ve seen in the market is the market has embraced open source,” he said. “A lot of the companies originally thinking of doing Hadoop their own way are aligning with Hortonworks.”

Cunitz cites Microsoft as an example. Hortonworks is the default Hadoop configuration for Azure.

Another example of this open source approach in practice can be seen in Hortonworks' approach to security. Hortonworks acquired XA Secure and released its formerly proprietary tech to the community under an open source license, which is now the Apache Ranger project.

“Before Ranger, there were several different ways and products to lock down the platform, and if you blow any of them, it was a big hole in the entire platform," said Cunitz. "Ranger makes it easier to authenticate. You’ll see more and more from us on the security side.”

Enterprises adopting modern data architecture are facing difficulty, as legacy and new data from disparate platforms are brought under management. The answer is to make sure things plug in cleanly and easily (YARN, Apache Spark), and to make sure that process is clean and easy. Open source means more interoperability, faster innovation, and less fragmentation.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like