Uber has open sourced AthenaX, the streaming analytics platform that runs its business. Simply put, the platform is the ride-hailing company's way of channeling data from a variety of real-time sources while running streaming analytics using Structured Query Language (SQL). There's no need to ask if it'll scale. If it's being used by Uber, at scale is where it starts.
This is only the latest example of a big corporation making the platform that powers its business available to anyone who needs it. Doing so puts Uber on a long list that includes Facebook, Walmart, and General Electric, to name just a few.
In production for six months, AthenaX currently runs more than 220 applications in multiple Uber data centers, where the company says it's processing billions of messages every day. It's being used with Michelangelo, the company's machine learning platform; with UberEATS Restaurant Manager, which analyses data for restaurants using its food delivery service; and UberPOOL, which drives it's carpooling service.
"To meet the needs of Uber’s scale, AthenaX compiles and optimizes SQL queries down to distributed streaming applications that can process up to several million messages per second using only eight YARN containers," Haohui Mai, Bill Liu and Naveen Cherukuri said in a fairly detailed how-it-works article on Uber's website. "AthenaX also manages the applications end-to-end, including continuously monitoring their health, scaling them automatically based on the size of inputs, and gracefully recovering them from node failures or data center failovers."
Out-of-the-box it comes equipped with resource estimation and auto scaling, which keeps it using just enough but not too many resources. And because Uber's business requires four nines of uptime, it has built-in monitoring and automatic failure recovery.
AthenaX has been released under the Apache 2.0 license, which means it can be used in projects licensed under practically any other open source license. And because Apache is a "permissive" license, it also means the code can be rolled into proprietary projects as well.
The latter is why I expect this to see a lot of quick uptake. It's mobile-ready, it scales, it can analyze massive amounts of data, and it can be taken private. Just what a lot of companies, startups and otherwise, need to launch that killer app that will be too complex to develop on a small budget.
Want to take a look? It's available on GitHub.