Google Open Sources Dataflow Analytics Code through Apache Incubator

As Apache incubator project , open source data analytics platform expected to remain open and available for integrators

Christopher Tozzi, Technology Analyst

January 21, 2016

2 Min Read
Google Open Sources Dataflow Analytics Code through Apache Incubator
People stand in the lobby of Google’s Washington, DC, headquarters in January 2015. (Photo by Mark Wilson/Getty Images)

By The VAR Guy


By The VAR Guy

Google is open-sourcing more code by contributing Cloud Dataflow to the Apache Software Foundation. The move, a first for Google, opens new cloud-based data analytics options and integration opportunities for big data companies.

Cloud Dataflow is a platform for processing large amounts of data in the cloud. It features an open source, Java-based SDK, which makes it easy to integrate with other cloud-centric analytics and Big Data tools.

The platform's main value for Big Data operations is providing compatibility with new technologies as they emerge while still integrating into existing workflows. That saves organizations from having to revamp their analytics infrastructure or code each time a new data processing framework appears.

Although the Dataflow SDK has been open source for more than a year, Google took the bigger step this week of proposing to turn the platform into an Apache Incubator project. That move paves the way for Dataflow's codebase to eventually become a full-fledged Apache Software Foundation project.

Google has partnered with Cloudera, data Artisans, Talend, Cask and PayPal in issuing the proposal. Those partners are already celebrating the proposal, which -- if approved, which it should certainly be -- will make it simpler to build Dataflow's scalability and integration features into commercial Big Data platforms in an open source, vendor-neutral way.

Talend, for instance, had this to say: "Developers leveraging the Dataflow framework won't be 'locked-in' with a specific data processing runtime and will be able to leverage new data processing framework as they emerge without having to rewrite their Dataflow pipelines, making it Future-proof."

For the channel, Google's proposal means the cloud and big data are set to grow closer together -- and that it will be easier for open source big data companies to keep the future of data analytics open.

This first ran at

Read more about:

Google Alphabet

About the Author(s)

Christopher Tozzi

Technology Analyst, Fixate.IO

Christopher Tozzi is a technology analyst with subject matter expertise in cloud computing, application development, open source software, virtualization, containers and more. He also lectures at a major university in the Albany, New York, area. His book, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” was published by MIT Press.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like