Yahoo Rolls Out Hadoop Enhancements
June 30th, 2010 By: John Rath
Yahoo announced significant enhancements to the open source Hadoop software Tuesday at the third annual Hadoop Summit in Santa Clara. Yahoo said the new features will accelerate the potential for enterprise-wise adoption by mainstream businesses. Apache Hadoop is an open source project for developing reliable, scalable, distributed computing.
Over the years that Yahoo’s use of Hadoop has evolved from applied science projects to an enterprise-class platform being used across a 35,000 server infrastructure to develop personalized content for tens of millions of users.
“Hadoop is where science meets big data – it’s the technical underpinning that powers our innovative consumer and advertiser products on the world’s most advanced digital canvas,” said Blake Irving, Executive Vice President and Chief Product Officer at Yahoo. “Yahoo’s cloud and Hadoop make it possible for Yahoo to rapidly personalize our content and advertising, and deliver highly relevant experiences, while maintaining the trust of our 600 million users.”
At the Summit Yahoo announced the beta release of Hadoop with Security and Oozie, Yahoo’s workflow engine for Hadoop. Yahoo reported that it has tested these two releases and deployed them across tens of thousands of servers.
Yahoo has also partnered with the global academic and scientific community as both a founding member of the Open Cirrus Testbed, which is advancing cloud computing research at an international scale, and the Open Cloud Consortium, a testbed for systems research on large-scale data clouds.
Several other companies announced Hadoop-related news at the summit:
Cloudera’s Hadoop Version 3
Hadoop-based data management software and services company Cloudera announced the third version of Cloudera’s Distribution for Hadoop (CDH). As a complete Hadoop-based data management platform, CDH version 3 contains core Apache Hadoop and eight additional open source projects in an easy to install and use package.
“Cloudera has gained deep experience in the market working with customers to deploy Hadoop in their organizations and has learned how to use Hadoop effectively,” said Doug Cutting, creator of Apache Hadoop and Architect at Cloudera. “CDH v3 is our response. It includes the most appropriate enterprise-grade add-on projects that enhance the core Apache Hadoop framework and make it easier for any organization to use.”
Two additional open source projects have been added as a part of CDH. Flume, Cloudera’s data loading infrastructure and Hadoop User Environment (HUE) code will be released under the Apache V2 open source license.
Cloudera also announced Cloudera Enterprise, the first product specifically designed to help organizations fully leverage the Apache Hadoop platform in a production environment, enabling them to cost-effectively store, manage and analyze all of their data.
“Businesses across all sectors are looking for ways to leverage the vast quantities of data they are accumulating, and Apache Hadoop is an efficient solution for processing data at scale,” said Melanie Posey, research director at IDC Research. “Hadoop has matured and is now becoming an enterprise-ready cloud computing technology with the addition of Kerberos authentication.”
MicroStrategy announced Hadoop Support
Business Intelligence software company MicroStrategy (MSTR) announced that MicroStrategy 9 offers seamless access to Hadoop as a data source. The MicroStrategy 9 integration with Hadoop uses Hive, a data warehouse infrastructure that is a subproject of Hadoop. MicroStrategy’s extended data access architecture allows application developers to submit queries using HiveQL, the Hive query language.
“The combination of MicroStrategy’s enterprise-class BI software with Hadoop’s data scalability enables a broader range of users, such as business analysts and non-technical users, to gain valuable insights from data stored in Hadoop,” said Amir Awadallah, co-founder and CTO at Cloudera.