Global Warming of Data

Eric Bassier is Senior Director of Datacenter Solutions at Quantum.

It already reached 90 degrees in Seattle this year. In April. I’m not complaining – yet – but I’m definitely a believer that global warming is happening and that we need to make some changes to address it. But this article isn’t about climate change – it’s about data. Specifically, it’s about the growth of unstructured data and the gloomy fate ahead if we continue to deny the problem and ignore the warning signs. Sound familiar?

It’s hard to argue with the evidence of unstructured data growth. Estimates and studies vary, but the general consensus is that there will be 40-50 zettabytes of data by the year 2020, and 80-90 percent of that will be unstructured.

What’s Driving Unstructured Data Growth?

Data growth comes from many places. Of course there are sources like 4K HD movies and TV shows, and movies, pictures, and images that all of us take on our smartphones every day, but unstructured data growth is much broader than that. There are also vast amounts of data generated everyday by machines and sensors across a wide variety of data-driven industries like research, engineering and design, financial services, geospatial exploration, healthcare, and more. Video surveillance alone is creating almost an exabyte of unstructured data every day as camera resolutions and retention times have increased.

These diverse datasets share some common characteristics. Typically, they are:

Comprised of large file sizes;
Un-compressible – i.e., techniques like deduplication are not effective at reducing the data;
Valuable to the company, department, or users that created the data;
Stored for years.

The Parallels with Global Warming

So how is unstructured data growth like global warming?

People behave like this problem doesn’t exist: Every day companies are spewing out more and more unstructured data into their IT environments, but when it comes to managing this growth, it is business as usual. Despite all evidence to the contrary, many businesses are still attempting to manage and store unstructured datasets using the same approaches to data storage they’ve always used – they put it all on disk. This approach is starting to break down in the face of both the size and scale of this data. Beyond growing costs, the ability to ingest the content into a storage system quickly enough degrades over time, and traditional backup approaches are no longer sufficient to protect the data.

For these massive machine- and sensor-generated datasets, clearly a different approach to storing and managing this data is required.

Data that has been thought of as “cold” is starting to “warm up”: A really interesting dynamic is appearing across multiple industries. With all of these datasets, the data is generated, processed and then archived. But now more and more examples are surfacing where companies can get additional value out of this “cold” data:

For video content generated for movie or TV studios, it can be repurposed and redistributed – think “behind the scenes” episodes of your favorite reality TV show.
Retail companies are analyzing video surveillance footage to track shopping patterns, and using the insights to increase sales.
Scientists are able to run analyses on datasets generated years ago to gain new insights and advance new innovations in their fields.
Autonomous car developers are using video and sensor data generated during early test drives to make autonomous cars safer and more efficient.

The list goes on, but the point is that for these types of datasets, as cold data becomes more valuable or “warms up,” the storage approach for that data needs to change. Even archived data needs to remain accessible to the users.

There’s a need to act now. Before you place that next large order for more disk storage, the time is now to stop and consider other alternatives. Sticking with the status quo is the easiest approach, but also one that leads to excess storage costs and inefficiencies.

What’s the Solution?

To tackle this problem, let’s first introduce what might be a new term: data workflow. In some industries this is a common term, but for many industries it might be a new concept, albeit an intuitive one. All of these unstructured datasets I’ve mentioned thus far have a workflow associated with them. It looks something like this: data is generated or captured, ingested into a storage system, and stored and processed to reach some result (often collaboration between many users is required); then data is archived for long-term preservation and re-use. This process is more efficient using a storage system that is customized from the outset for specific dataset workflows.

Workflow storage must handle high performance ingest when needed. Also key is the ability to share across the network to enable collaboration – as well as the ability to tier data to lower cost tiers of storage such as tape while preserving access on the network for the users and applications that need the data. This last piece is what really unlocks the ability to get more value out of the archived data in a way that doesn’t break the bank.

This workflow-based approach to storage results in significant cost reductions compared to keeping all data on flash or spinning disk, and it enables other organizations to do more with their data.

And, One More Parallel…

By using tiered storage and keeping most of this data on low-cost, low-power storage like tape, you’re actually doing your part to help the environment, and fight global warming.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Comments

Plain text