Nikita Ivanov founded GridGain Systems in 2007, which is funded by RTP Ventures and Almaz Capital. Nikita has led GridGain to develop advanced and distributed in-memory data processing technologies, including a Java in-memory computing platform.
As a data storage and retrieval option, NoSQL, with its less-restrained consistency models (vs. relational databases) has long been viewed as a way to achieve design simplicity, horizontal scaling and more granular control over availability. With high potential for optimization, NoSQL databases can potentially perform quite well in terms of latency and throughput, making them key components of many big data and real-time applications. With that said, the data demands of most organizations today, in terms of sheer volume and speed, may yet overwhelm NoSQL-based infrastructure.
Coming Up Short in the Face of a Data Avalanche
Consider, for example, fraud detection and market risk within the financial services industry: organizations may only have a fraction of a second to analyze a wide range of data sets and make a decision. Or, think about the logistics demands of today’s global organization, which needs to calculate pickup and delivery routes in real time, based on package location, traffic/weather conditions and countless other factors. Similarly, often need to track every single item in each of their retail locations around the world and analyze sales patterns and trends in real-time to stay ahead of the market.
All of these examples involve processing and deriving live action from streaming data — which will soon become the lifeblood of business decisions. However, this is an area that is giving most organizations quite a bit of trouble, largely due to the fact that traditional computing has basically reached its limit in this regard. It’s unlikely that companies will be able to gain the real-time insights that they need from the massive amounts of data they’re taking in using traditional computing methods — it’s too slow and inflexible. Therefore, it’s not entirely surprising that the majority of NoSQL usage has traditionally come from non-critical websites. In short, NoSQL has been somewhat limited in its potential. But it doesn’t have to be.
Taking NoSQL further with In-Memory
There’s no logical reason for NoSQL to remain primarily in the non-critical realm, and certainly many of the partnerships and advances that MongoDB is putting forth demonstrate that NoSQL will play a major role in the emerging data-driven economy. The financial services industry, which I previously mentioned, is among the first industries to employ In-Memory technology in order to analyze massive quantities of data in real-time in order to solve problems that they otherwise couldn’t address through traditional computing methods.
Kirill Sheynkman of RTP Ventures defines In-Memory computing: “In-Memory Computing is based on a memory-first principle utilizing high-performance, integrated, distributed main memory systems to compute and transact on large-scale data sets in real-time – orders of magnitude faster than traditional disk-based systems.”
To put it simply, traditionally, you take the data to the computation. This is a time-consuming, resource-intensive process. With In-Memory, the idea is to bring the computation to your data. This is orders of magnitude faster and frees resources. It’s kind of like family visits over the holidays — do you pack all your kids, cousins and siblings in a bus and drive them across the country to visit your great uncle, or do you simply purchase a plane ticket for him and bring him to where everyone else already is to celebrate? Taking the data to the computation — particularly now that there is such an incredible amount of data to be processed — is like taking the bus across country. With In-Memory, we have the opportunity to purchase a plane ticket, so to speak.
Putting NoSQL on Steroids
While In-Memory data grids are generally more sophisticated than NoSQL, it is possible to “accelerate” NoSQL to work with In-Memory technology to achieve superior performance. With natively distributed in-memory architecture, organizations can achieve scale-out partitioning, increased performance and improved scalability. As it is by far the most popular NoSQL database currently on the market, let’s talk about how a natively distributed In-memory database with support for MongoDB driver protocol can be pushed into overdrive to meet your NoSQL performance needs.
In order to leverage NoSQL In-Memory with minimum pain points, look to achieve certain things in your implementation:
- Configuration on the database or collection level. This requires no code change to user applications enabling much easier integration.
- Deployment between user applications and optional MongoDB database. With this approach user applications require no changes and continue to use their native MongoDB drivers for target languages.
- Configuration to work with specific MongoDB databases or individual collections. Collections not processed are simply passed through to underlying MongoDB database, freeing computing power to focus on what you really need it for, rather than needlessly redundant and unnecessary processing.
- Native support for MongoDB driver. This eliminates the need to change any application code.
- Native distribution and data partitioning. By leveraging an In-Memory database and allowing it to employ native distribution and data partitioning you can scale up to several thousands of nodes in production settings and avoid many of the shortcoming associated with sharding.
- Rely on a distributed memory-first architecture. Keeping all data in RAM with a secondary on-demand disk persistence allows you to completely eliminate memory-mapped file paging.
- Implement a field compaction algorithm. Though NoSQL works with unstructured data, field names will frequently repeat between documents and create excessive storage overhead. You can potentially save up to 40 percent of memory space by implementing a field compaction algorithm which internally indexes every field.
- Keep index and data in-memory. You can significantly increase performance and scalability for most MongoDB operations, including the ability to run on large commodity-based clusters.
As the business landscape becomes more competitive, organizations will need to begin gaining real-time insight from their data in order to meet a host of challenge, which may be mission critical. In an era in which customers can switch banks in a matter of minutes, financial services firms, for example, only have a small window of time in which to approve a loan before a competitor swoops in. Additionally, with the advent of social media, a minor snafu can turn into a major revenue-draining disaster within hours. Companies need to stream insight from multiple sources and analyze it immediately.
Essentially the aforementioned steps will enable you to skip over a number of processes, allowing you to instantly see increased performance, improved scalability and scale-out partitioning and get straight to the business of drawing conclusions from NoSQL data.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.