As big data conferences, Strata and Hadoop World, convene in New York this week, there's lots of big data news. Live streaming is available.
Splunk Analytics for Hadoop
Operational intelligence software provider Splunk (SPLK) launched Hunk, a software that integrates exploration, analysis and visualization of data in Hadoop, this past summer. On Tuesday, the company announced the general availability of Hunk: Splunk Analytics for Hadoop.
Hunk is a full-featured, integrated analytics platform for Hadoop that enables users to interactively explore, analyze and visualize historical data in Hadoop. Built with patent pending technology, Hunk offers powerful, self-serve analytics without the need for specialized programming.
“Hunk is transforming the way organizations analyze their data in Hadoop by replacing drawn out development cycles with software that enables customers to deploy and deliver insights in hours instead of weeks or months,” said Sanjay Mehta, vice president of product marketing, Splunk. “Hadoop is an increasingly important technology and many organizations are storing vast amounts of data in Hadoop. However, this often creates a problem because the data sets become too big to move and more traditional approaches to analytics of raw data in Hadoop require brittle, fixed schemas. These are key reasons our customers consistently tell us about the cost, time and sheer difficulty of getting analytics out of their Hadoop clusters. With Hunk, we applied everything Splunk has learned from ten years of experience with more than 6,000 customers to this unique challenge.”
More than 100 took part in the Hunk beta program. “Hunk enables our enterprise customers to achieve their big data goals,” said Kou Miyake, President and CEO, NTT DATA INTELLILINK Corp. “Hunk accelerates insights from Hadoop with a much faster time-to-value than open source alternatives. Hunk also enables enterprise developers to build big data applications because of the rich developer environment and tooling.”
Red Hat Storage Team Adds Apache Hadoop Plug-in to Gluster
Red Hat's Apache Hadoop plug-in was added to the Gluster Community, the open software-defined storage community. Gluster users can deploy the Apache Hadoop Plug-in from the Gluster Community and run MapReduce jobs on GlusterFS volumes, easily making the data available to other toolkits and programs. Conversely, data stored on general purpose filesystems is now available to Apache Hadoop operations without the need for brute force copying of data to the Hadoop Distributed File System (HDFS).
The Apache Hadoop Plug-in provides a new storage option for enterprise Hadoop deployments and delivers enterprise storage features while maintaining 100 percent Hadoop FileSystem API compatibility. The Apache Hadoop Plug-in delivers significant disaster recovery benefits, industry-leading data availability, and name node high availability with the ability to store data in POSIX compliant, general purpose filesystems.
To download the Apache Hadoop Plug-in, users can go to https://forge.gluster.org/hadoop/. For the Apache Hadoop Ambari Project, users can visit the Apache Hadoop Community at http://hadoop.apache.org/.
GlusterFS Now Integrated with Intel Distribution for Apache Hadoop Software
Red Hat, Inc. also announced that it has contributed software to the Gluster Community that integrates the GlusterFS open software-defined storage filesystem with the Intel Distribution for Apache Hadoop software. The resulting code, the Apache Hadoop Enablement on GlusterFS plugin, delivers a big data analytics solution that easily integrates with existing IT infrastructure. The companies jointly validated the reference architecture integrating GlusterFS with the Intel Distribution.
Red Hat has contributed code to the Apache Hadoop community supporting the Hadoop Compatible File System standard and the Intel Manager for Apache Hadoop can now configure, monitor, and manage GlusterFS as a Hadoop-compatible File System (HCFS). The Hadoop Enablement solution avoids the cost and complexity associated with creating and managing another data silo for analytics. The integrated solution was built using community-driven innovation to deliver an open and interoperable solution.
Combining the performance, security, and manageability of the Intel Distribution with the HDFS API compatibility and disaster recovery capabilities of GlusterFS, the integrated solution supports a scalable, cost-effective infrastructure for big data analytics. The Intel Distribution includes security mechanisms such as query authentication, data encryption, role-based access control, and auditing. GlusterFS maintains data locality as the cluster scales, avoids NameNode bottlenecks and the single point of failure in HDFS, and has built-in disaster recovery with its geo replication feature.
Red Hat sees the evolution of analytics extending beyond Hadoop and traditional business intelligence systems into a comprehensive view of end-to-end big data analytics. Most enterprises try to manage big data from three sources, including business-, machine- and human-generated data through a work flow that includes three types of analytics systems, massively parallel processing, Hadoop clusters, and traditional business transaction processing. This broader view of big data requires a general purpose storage repository such as GlusterFS that can store a variety of data in its native format and serve it to a variety of analytics systems through multiple protocols. An end-to-end view of all the enterprise data and all the enterprise analytics systems offers a more comprehensive way to allow for deep business insights and help drive operational intelligence.