Krishna Kallakuri is a founding partner, owner and vice president of DataFactZ. He is responsible for executing strategic planning, improving operational effectiveness, and leading strategic initiatives for the company.
Today, we collect and store data from a myriad of sources such as Internet transactions, social media activity, mobile devices and automated sensors to name a few. Software always paves the path for new and improved hardware. In this case, Big Data, with all its computing and storage needs, is driving the development of storage hardware, network infrastructure and new ways of handling ever-increasing computing needs. The most important infrastructure aspect of Big Data analytics is storage.
Data over the size of a petabyte is considered Big Data. The amount of data increases rapidly, thus the storage must be highly scalable as well as flexible so the entire system doesn’t need to be brought down to increase storage. Big data translates into an enormous amount of metadata, so a traditional file system cannot support it. In order to reduce scalability, object oriented file systems should be leveraged.
Big Data analytics involves social media tracking and transactions, which are leveraged for tactical decision making in real-time. Thus, Big Data storage cannot appear latent or it risks becoming stale data. Some applications might require real-time data for real-time decision making. Storage systems must be able to scale-out without sacrificing performance, which can be achieved by implementing a flash based storage system.
Since Big Data analytics is used across multiple platforms and host systems, there is a greater need to cross-reference data and tie it all together in order to give the big picture. Storage must be able to handle data from various source systems at the same time.
As a result of cross-referencing data at a new level to yield a bigger picture, new considerations for data level security might be required over existing IT scenarios. Storage should be able to handle these kinds of data level security requirements, without sacrificing scalability or latency.
Big Data also translates into big-prices.The most expensive component of Big Data analytics is storage.Certain techniques like data de-duplication, using tape for backup, data redundancy and building custom hardware, instead of using any market available storage appliances, can significantly bring down costs.
Big Data typically incorporates a Business Intelligence application, which requires data integration and migration. However, given the scale of Big Data, the storage system needs to be fixed without any need of data migration needs and simultaneously flexible enough to accommodate different types and sources of data, again without sacrificing performance or latency. Care should be taken to consider all the possible current and future use-cases and scenarios while planning and designing the storage system.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.