Data is more important than ever before, yet the speed of data generation, along with the struggles to effectively collect, prepare, organize, store, manage, protect and analyze it, continue to challenge IT departments. In the keynote and in separate expert sessions at this month’s Interop Digital: Data Management, Storage & Disaster Recovery Event, IT pros and industry analysts discussed some ways to conquer these challenges.
"Increasingly it's becoming all about the data in today's world," said Dennis Hahn, a senior analyst with the Omdia Research Group, in the Interop event’s keynote. "There has been a shift in IT from being just the operational brains of the organization to also becoming the data processing factory for business insights."
The first challenge with enterprise data processing today is the rate at which data keeps growing. Omdia research finds that data is growing at a 22% compound annual growth rate (CAGR) over five years. Data also is changing. There is more unstructured data, like email messages, videos, photos, web pages and audio files. There is also much more data produced at the edge, including data from sensors and other IoT devices, as well as data from mobile devices.
All of this disparate data has to go somewhere, and for most organizations, that includes some type of cloud. With data and storage now being distributed across different locations, hybrid cloud seems to be the model of choice. The industry response, Hahn said, is to create a data fabric, or holistic ability to manage the placement and the movement of data across the different locations.
That still leaves the question of where to store data. Hahn recommended that organizations evaluate each application on its own to decide whether it’s best stored in a cloud or on-premise. In many cases, the best solution is one that allows data to be moved back and forth between on-premise and cloud environments. At the same time, many organizations are moving away from mission-critical, highly centralized SAN storage to more distributed storage.
At Interop Digital's "Managing the Data Flood with High-Capacity Storage" talk, ScaleFlux product management executive JB Baker noted that storage has come a long way.
"We’ve had massive increases in the speed that these devices can serve the data, but a lot of them really aren’t optimized around the advances we’ve had in storage technology," he said. Many are still tied to the legacy of hard drives and haven’t taken full advantage of flash—particularly NVMe flash, he added.
Distributed performance storage – storage that can provide the performance necessary to feed data to emerging functions like real-time analytics, machine learning, artificial intelligence and high performance computing (HPC) – also will become more important in enterprise data processing strategies.
"HPC compute implementations, for example, are designed to pull data in, process it very quickly and then send it out, with some kind of insight or some kind of a result," Hahn explained. "To do that, you have to feed the data very quickly. Traditional storage tends not to do a good job of that, especially at [acceptable] price points."
New Strategies for Data Collection and Management
While the advances in storage technology are good enablers for storing data, there are still plenty of challenges around data collection and management. In the past, Hahn noted, organizations would simply pull data out of their databases and put it in a data warehouse. But with more diverse and richer streams of data flowing from different places, things can quickly get muddled. From the data warehouse or a data lake, data sets must be prepared and pulled out for other uses, such as analytics that provide insights.
So how can organizations sort through the vast amounts of data they are dumping into data lakes or warehouses to determine what’s junk and what’s valuable? Instead of doing data transformation before loading it into your storage, Hahn suggests transforming the data after loading, which allows organizations to pick and choose the best data more effectively.
Metadata – the data about data – also becomes increasingly important in this context, said Steve McDowell, a senior analyst covering data and storage at Moor Insights & Strategy. McDowell suggests using a cataloging system that understands the metadata. "Until you know the data you have, you don’t know what you can do with it," he said.
Storage management definitely needs to mature, Hahn said, and automation and artificial intelligence are helping create smarter storage management. Imbued with these advanced capabilities, software can now perform predictive diagnostics and performance optimizations.
Smart storage – the ability to analyze stored data intelligently – is important for both efficiency and safety, said Dr. Narasimha Reddy, storage site director of the Center on Intelligent Storage at Texas A&M University and participant in the "Elevating Data Management" panel. It’s an effective way to root out anomalies while they are occurring so the reaction can be immediate. The same is true for fraud detection or supply chains. Companies also could use natural language processing to analyze stored information related to voice calls, email or video.
Say Hello to Computational Storage & Other Advances
While advances in traditional storage are important, there is even more going on behind the scenes in the enterprise data processing journey. Computational storage, an architecture designed to offload some of the tasks from the CPU, can alleviate bottlenecks and reduce data movement, increasing efficiency. Computational storage drives, which Baker said will eventually replace ordinary SSDs, will be very helpful for accelerating databases.
"For high-performance databases, computational storage drives help by delivering better latency profile than ordinary NVMe SSDs, SAS or SATA SSDs. It’s orders of magnitudes better than hard drives," he said. It can also save companies real money by storing more data per drive because it performs the compression task in an extremely efficient way.
In addition to compression, computational storage can perform application-specific tasks like data filtering. For example, you can filter to reduce the amount of data you’re transferring across your network, which in turn reduces the amount of data the CPU has to filter.
Another up-and-coming technology is DNA storage, a dense type of storage ideal for reducing scalability and capacity issues. It’s much denser than any other storage technology today and uses very little power, explained Western Digital VP of strategic initiatives Steffen Hellmold during the "Managing the Data Flood With High-Capacity Storage" panel. (Hellmold is a member of the DNA Data Storage Alliance.)
Scalability is key, Hellmold said, since many organizations will exceed the capabilities of scaling existing tools in the storage toolbox, especially HD and SSD. This is especially relevant for data archives, which continue to grow exponentially over time.
That means these tools need to be replaced with something that can scale orders of magnitude more. That’s where DNA storage comes in. Essentially, the process works by generating DNA strands from the four main compounds of DNA: adenine, cytosine, guanine, and thymine.
There are plenty of use cases for DNA storage other than managing archives, including autonomous driving or video surveillance. If you have to prove that a car was involved in an accident or that a suspect committed a crime, you would probably rely on massive stores of data that have been collected over time, along with all of the data used to train the algorithm.
The cheapest way to store large amounts of data for long periods of time may be DNA storage. But that’s just the tip of the iceberg. Hellmold gave another example that really brought the possibilities home. "You can code and coat a medical tablet with DNA data storage that gives you the entire history of what’s in the tablet and where it was made."
With so many advances and possibilities, it’s virtually a full-time job to figure out how to move forward with enterprise data processing, storage and management.
There is no easy answer, Hahn said; IT simply has to find more efficient and effective ways to collect, prepare, organize and store data. Managing, preparing, securing and analyzing data come next. It’s an ongoing cycle, but new technologies and methods continue to emerge, helping organizations with these difficult decisions.