Data Growth Tests Backup Capabilities
Steven Rodin is the CEO of Storagepipe Solutions, which has been a leading provider of online backup services for business since 2001.STEVEN RODIN
Companies today face numerous and constantly changing backup challenges. As a provider with more than a decade of experience in online backup services, we’re seeing some new emerging trends in the types of issues that organizations will need to overcome in the future.
In the early days, organizations were primarily concerned with data protection, encryption and automation.
But recently, we’ve noticed that rapid data growth has had a dramatic effect on what organizations are demanding from their backup systems. Now, we’re seeing increased demand for:
- Continuous data protection,
- Bare Metal Recovery (Recovering entire servers, including OS, files and configurations),
- Reduced backup windows, and
- Faster recovery speeds.
In fact, eWeek recently claimed that worldwide annual data production has actually exceeded worldwide storage capacity. And the gap between the data that organizations produce and their ability to store it will only continue to grow in the years to come.
We’ve identified a number of important trends which are accelerating the growth rate for corporate data. Listed below are a few of the most important.
This is probably the most obvious place to start. Hard drive capacity has been falling exponentially in price ever since the announcement of Moore’s Law.
Until recently, organizations had to be very selective about what sort of information they stored, and what sort of information they would toss out. But now, this hardware is so cheap and abundant that attitudes have shifted to more of a “Better keep this. We may need it someday” mentality.
A number of new technologies – such as advancements in compression, deduplication and hardware virtualization – have improved overall storage utilization and further accelerated the rate at which the cost-per-gigabyte of storing data is falling.
Cheap and Abundant Bandwidth
Now that Internet bandwidth is no longer the bottleneck it once was, we’ve seen an explosion in the growth of streaming multimedia content. VOIP services like Skype are killing the long-distance phone industry, and YouTube is now adding over 2 days of video every minute.
This bandwidth availability has also accelerated the growth of file sharing and online storage. Now, large files are being copied and distributed at an exponentially-growing scale, which has caused duplicate data to become a major source of storage waste and data growth. If one person shares a 1GB file with 500 people, that’s half a terabyte of storage consumption.
Business is Going Paperless
Email has replaced letters, eBooks and tablets have nearly replaced paper books, and digital imaging has replaced photographs and x-rays. Not only are paperless offices better for the environment, but they are also more productive, more flexible and better able to extract value from their business data.
This can be seen in the medical industry, where static physical imaging has been overtaken by rich video imaging. And these huge medical video files require very large amounts of storage.
In fact, the medical industry is not alone. Many industries are using more and more video (which is highly storage intensive) for marketing online, security and communication.
Enhanced Automated Data Collection Capabilities
Automated data collection is one of the fastest-growing areas in the “big data” space. With every move we make, we’re generating GPS data, web traffic statistics, power usage data, surveillance video, and a broad range of data which is being collected by companies.
Automated data collection is often called the “Pandora’s Box” of the big data revolution. The information being collected about us through the electronic devices we use every day could present a threat to our privacy, but they also have the potential to offer tremendous value to society.
A great example of this would be Google Flu Trends which is saving lives by helping to predict and combat global disease outbreaks.
New Advances in Data Analysis Technology
Until recently, data analysis was almost exclusively performed on structured relational databases which were maintained and organized by humans. But now, many organizations are running into performance limits as data growth is exceeding the capabilities of these systems.
Recently, we’ve seen a completely new approach to data storage which focuses on rapid analysis and processing of vast data volumes. Technologies like Hadoop, Cassandra, MapReduce and NoSQL have given birth to a whole new class of services, and have revolutionized the way organizations think about the data they collect.
Also, the Web2.0 revolution popularized the use of APIs and shared third-party databases. Organizations can now get more insight into their internally-generated business data by integrating external feeds and databases into their reporting and analysis.
The Growing Strategic Importance of Data
In the past, data was simply a tool which assisted in decision making and helped companies execute on their strategic objectives. But recently, business headlines have increasingly been dominated by Google, Facebook, Apple’s iTunes and other brands which have built their entire corporate strategy around the data they own.
Information is power, and it’s now more powerful than ever.
Organizations with tight margins – such as manufacturing and retail – have long known the importance of big data. And now the rest of the business world is starting to catch up.
Many fields have also been held back by technology’s ability to handle the data that they produce. This is particularly true of scientific research – such as genomics, artificial intelligence and machine learning, satellite imaging, and high-energy particle physics – where petabytes of data can be produced very quickly. The discoveries generated by this research will eventually be used by technologists and engineers to create new data-intensive industries, companies and products.
Even if companies wanted to reduce the amount of data they store, they wouldn’t always be able to. Laws like PIPEDA, HIPAA, Sox404 and many others are forcing companies to retain historical archives of their exponentially-growing business data going back several years.
As this data grows, storage increasingly becomes a major business problem. Also, companies must plan for cost-efficient search and retrieval of these large historical data volumes in order to remain prepared for an unexpected electronic discovery request.
Not only is data growing exponentially, but the relative value of this data is also growing. Tomorrow’s big-data applications will require a more strategic approach to data backup.
- As the scale and complexity of big data storage grows, it’ll quickly reach a point where manual handling is no longer practical, desirable, economical, or even possible. Automation will become absolutely essential when it comes to backing up big data.
- Many big data applications have serious privacy implications for the customers that benefit from their use. So security will become a top priority for backup administrators. Gone are the days of unencrypted backup tapes.
- One of the primary benefits of big data over relational databases is the fact that massive data pools can be analyzed with extreme speed, and at high volumes. This has created a whole new class of applications which are built on real-time data. Because “recency” is so important in these applications, backup frequency will have to be greatly augmented in order to optimize Recovery Point Objectives. (RPOs)
- Also because of the increasing strategic importance of these applications, downtime will need to be minimized. This means smaller backup windows, built-in redundancy, and server failover to disaster recovery sites.
That’s why many organizations are opting to outsource their data backups by partnering with experts who operate ahead of the trends and who can assist with the complexity of some situations.
This way, they’ll be empowered to adapt more quickly to rapid changes in both the growth and the nature of the data they’re managing.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.