Hadoop takes its name from the toy elephant that belongs to the son of Doug Cutting, a chief architect at Cloudera and one of the engineering minds behind the open source architecture.

Hadoop takes its name from the toy elephant that belongs to the son of Doug Cutting, a chief architect at Cloudera and one of the engineering minds behind the open source architecture.

Real-Time Search and Analytics Platform Elasticsearch Gets Hadoopier

Add Your Comments

Fresh off a $70 million Series C funding round, real-time search and analytics platform provider Elasticsearch announced release 2.0 of its Hadoop connector, called Elasticsearch for Apache Hadoop, along with certification on Cloudera Enterprise 5. This means Elasticsearch is now compatible across all Apache-based Hadoop distributions, including the other two big distros Hortonworks and MapR.

Elasticsearch helps pull data from any environment and get it into the hands of developers, engineering leads, CTOs and CIOs who need insight into moving parts of their business at the rate they are moving. The connector gives the ability to read and write data between Hadoop and Elasticsearch. When ElasticSearch is used in conjunction with Hadoop, organizations no longer need to run a batch process and wait hours to analyze their data. It takes minutes.

ElasticSearch offers what it calls the ELK stack. In addition to powering search functionality it utilizes log management tool Logstash and Kibana for data visualization capabilities to help businesses gain immediate insights from their data stores. Combined with Hadoop, it simply becomes more powerful.

There’s native integration and support for popular Hadoop libraries. Users can run queries natively on Hadoop through MapReduce, Hive, Pig or Cascading APIs. Another benefit is the ability to Snapshot/Restore. The two forces combined make it easy to take a snapshot of data within Elasticsearch – perhaps a year’s worth – and archive it in Hadoop. At any time the snapshot can be restored back to Elasticsearch for additional analysis.

“Hadoop was created to store and archive data at a massive scale, but businesses need to be able to ask, iterate and extract actionable insights from this data, which is what we designed our products for,” said Steven Schuurman, Elasticsearch cofounder and CEO. “With today’s certification from Cloudera, Elasticsearch now works with all Apache-based Hadoop distributions, and with it solves the last mile of Big Data Hadoop deployments by getting big insights fast.”

High-profile clients: check

Elasticsearch has an impressive and growing customer roster, featuring the likes of Comcast, eBay, Facebook, Mayo Clinic, Foursquare, SoundCloud and Tinder. The company highlighted some additional customers as well as how important Hadoop integration is to their leveraging of ElasticStack. Two examples named were Klout and MutualMind.

Klout is an online reputation management firm, which connects petabytes of data stored in Hadoop Distributed File System on its 400 million-plus users to Elasticsearch so it can deliver query results in seconds rather than minutes to quickly build targeted marketing campaigns for their customers.

“Elasticsearch has a very good integration with Hadoop,”said Felipe Oliveria, director of backend engineering at Klout. “It allows us to export a Hive table to an index on Elasticsearch very easily. HBase is a great data store, and it allows random access to the data, which Elasticsearch is perfect for. Elasticsearch fits very nicely into our data pipeline.”

MutualMind provides brand monitoring on social networks for customers like AT&T, Kraft, Nestle and Starbucks. After their Hadoop batches started taking more than 15 minutes, they moved to Elasticsearch to power their real-time analytics, while utilizing Hadoop for statistics analysis.

About the Author

Jason Verge is an Editor/Industry Analyst on the Data Center Knowledge team with a strong background in the data center and Web hosting industries. In the past he’s covered all things Internet Infrastructure, including cloud (IaaS, PaaS and SaaS), mass market hosting, managed hosting, enterprise IT spending trends and M&A. He writes about a range of topics at DCK, with an emphasis on cloud hosting.

Add Your Comments

  • (will not be published)