MapR has released version 5.0 of its Hadoop distribution with enhancements around powering real-time analytics for business, security, and self-service data exploration.
There are several new features in the major platform upgrade aimed at providing a single platform for a variety of data needs. The company laid the groundwork across several incremental releases, improving distributed systems capabilities and introducing new functionality like Apache Spark and Drill. The 5.0 release brings everything together as well as extensions with key functionality such as Elasticsearch and more.
New auto-provisioning templates have also been introduced to speed deployment of Hadoop clusters on infrastructure of choice, whether it be in-house, by a service provider, or in private or public cloud.
While Big Data is often leveraged to figure out trends in data looking back, there is increasing demand for real-time analytics and being able to tune a business in response to that real-time data. Many Hadoop players have improved their platforms with a focus on real-time needs. MapR’s platform is meant for processing all needs side-by-side on a single platform.
“The theme in 5.0 is around data agility and helping companies respond faster with informed data,” said Jack Norris, chief marketing officer for MapR. "A key aspect of that is real-time applications, as well as bringing agility in terms of administrators. All of this happens in the backdrop while maintaining a secure environment.”
The 5.0 release extends the MapR Real-time, Reliable Data Transport framework, used in the MapR-DB Table Replication capability, to deliver and synchronize data in real time to external compute engines. The MapR-DB capabilities are similar to an enterprise-grade HBase used for high-scale, low-latency real-time applications.
The first supported external compute engine is Elasticsearch, which enables synchronized full-text search indices automatically without writing custom code. Elastic raised $70 million last year and has been on a tear, winning big names such as Comcast, eBay, Facebook, Mayo Clinic, Foursquare, SoundCloud and Tinder; and connecting deeper with the Hadoop space through a connector.
“We are pleased to be working with MapR on integrating its real-time delivery framework with Elasticsearch,” said Jobi George, global partner director at Elastic, in a press release. “Customers want search indices automatically synchronized with the latest data updates. The MapR architecture makes this easier for application developers who need to let their end users search for data almost immediately after it is updated.”
MapR 5.0 also includes comprehensive security auditing, Apache Drill support, and the latest Hadoop 2.7 and YARN features.
Additionally, MapR has comprehensive auditing for all data accesses via JSON log files. This enables extensive reporting, validation and quick analysis with Apache Drill.
Organizations are increasingly deploying multiple applications on a single Hadoop cluster, said Norris, adding that one in five MapR customers deploy more than 50 separate applications on a single cluster. The latest MapR release auto synchronizes storage, database, and search indices to support complex, real-time applications.
The enhancements follow several others such as better clustering across distributed systems, the addition of Apache Spark for real time, and recently added support for Apache Drill for self-service data exploration. The Drill Views feature has been added, allowing secure access to field-level data in files to ensure only authorized data can be analyzed by specific analysts.
The company rolled out on-demand training for Hadoop earlier this year, which saw over 20,000 participants.