Insight and analysis on the data center space from industry thought leaders.

Exposing the Six Myths of Deduplication

Companies are faced with data growing at increasing rates. That impacts the entire data protection strategy. It also makes data slow and dopey like a koala bear, writes Darrell Riddle of FalconStor. Addressing it through deduplication will address mounting data storage issues.

Industry Perspectives

March 21, 2013

7 Min Read
Data Center Knowledge logo

Darrell Riddle, senior director of product marketing for FalconStor Software., is a professional with more than 23 years of experience in the data protection industry. Darrell has an extensive understanding of both the technical and business aspects of marketing, product management and go-to-market strategies. Prior to joining FalconStor, Darrell worked at Symantec.




Most companies have lots of duplicate data. That’s a fact. Many companies are aware of it, but it falls in the category of cleaning out the garage or a spare room. You see the problem, but until you run completely out of space, it usually doesn’t get straightened up.

Many IT managers believe the software and/or hardware they purchased already deals with this kind of problem. The truth is, this may or may not be correct. In fact, enterprises are taking full advantage of how current technology can eliminate redundant data. In some cases, companies have not turned on features that help them with duplicate data (hereafter known as "deduplication" or "dedupe"), nor are they actively using deduplication as a key aspect of their data protection plans. The reluctance of IT administrators to embrace dedupe usually stems from their lack of knowledge of the potential benefits of deduplication or past experience with a less-than-robust solution.

However, deduplication is a critical aspect of every backup environment that brings cost-savings and efficiency to the enterprise. Depending on which report you read, companies are faced with data growing at the rate of 50 percent to nearly doubling data annually. That impacts the entire data protection strategy. It also makes data slow and dopey like a koala bear. Backup windows aren’t being met, and there is no way that disaster recovery testing can take place. Think of this entire problem like picking up a squirt gun to put out a fire – it just won’t work.

Deduplication solutions are also valuable to disaster recovery (DR) efforts. Once the data is deduplicated, it is then transferred (or replicated) to the remote data center or offsite DR facility, ensuring that the most critical data is available at all times. Deduplication is crucial as it reduces storage and bandwidth costs, provides flexibility and data availability, and integrates with tape archival systems. Deduplication is a vital part of the future of data protection and needs to be integrated.

In this article, I will dispel six myths attached to deduplication, bring clarity to the technology and outline the cost savings and efficiencies enterprises can reap.

Myth 1: Deduplication methodology is a life sentence with no chance of parole. Most enterprise IT admins feel that if they purchased a specific deduplication solution, they are stuck with that method for life.

Reality: Flexibility is at the core of modern deduplication solutions, which allow firms to choose the deduplication methods that are the best fit for specific data sets. Many companies offer portable solutions, similar to being able to move electronic music from one device to the next. By doing this, IT can align its backup policies with business goals.

Myth 2: Each server is its own island and there are no boats. The myth is that each server is its own island with separate deduplication processes and none of the islands talk to each other.
Reality: As the Internet has expanded our ability to communicate globally, deduplication solutions have also gone global to eliminate any multiple copies of data. With global deduplication, each node within the backup system is deduplicated against all the data in the repository. Global deduplication spans multiple application sources, heterogeneous environments and storage protocols.

Myth 3: I don’t have the money to swap out or upgrade my hardware, and even if I did, I would spend it on something else. The perception is that deduplication servers need to be replaced when space on the server runs out. The system doesn’t allow for upgrades. To increase capacity, companies need to exchange the equipment and implement more servers and memory.
Reality: Scalability is key to all IT environments, as the rate of data is growing exponentially. IT administrators must be able to scale capacity to the backup target disk pool and build disk-to-disk-to-tape backup architectures around the deduplication system. Rather than a swap out replacement, deduplication repositories can scale as needed with cluster and storage expansions.

Myth 4: Deduplication slows down performance worse than my antivirus product. IT admins feel that the performance of their systems will slow down because there is too much work for the deduplication server to handle. This performance will hamper the entire backup environment and cause issues when data needs to be recovered quickly.
Reality: Deduplication can scale up to high speeds and has the ability to pull data into post processing to take the pressure off the backup window and increase the speed. In choosing a deduplication solution, IT administrators must consider how it will support the latest high-speed storage area networks (SANs). This is critical for achieving fast deduplication times. Those solutions with unique read-ahead technology provide fast data restore, even from deduplicated tapes.

Myth 5: Deduplication is a single point of failure and is not a good idea in general. This myth assumes that if one of the deduplication nodes fails, then the company is stuck. Most deduplication solutions don’t protect against failure, as they are not linked into a single clustered system with failover capability.
Reality: On the contrary, high availability allows companies to add additional storage and backup up any data in another node if there should be a server or storage failure. A deduplication solution that allows for the linking of multiple nodes eliminates the problem of a single point of failure, because if one node fails then the system automatically fails over to another node. High availability ensures that the data is available at any time, dramatically reducing recovery point time and point objectives in the event of an IT failure or disaster. Additionally, advanced deduplication solutions provide high availability backup nodes that scale independently of high availability cluster nodes. This allows companies to handle large data sets or meet more severe backup windows.

Myth 6: Deduplication will not write data to tape, and I still use and need my tapes. IT administrators feel that implementing deduplication will force a change in the backup procedure because it can’t be integrated into the backup procedures and tape archival part of the process.
Reality: Deduplication does not require a rip-and-replace approach for target backup processes. By using a virtual tape user interface, the deduplication appliance replaces tape with disk, but with no backup process change. Many corporations require tape backup to meet archival and legal data retention requirements. IT admins need advanced automated tape management capabilities within their backup and deduplication environments to simplify their operations, decrease media consumption and reduce tape handling costs. The deduplication solution must integrate smoothly into the tape archival system, providing IT admins with the ability to deduplicate data before it is archived.

With disk-based and flexible deduplication systems, companies have quicker restore times following IT issues or disasters. The data set can be stored locally for quick recovery, as well as exported to tape. Deduplication does not slow down a firm’s recovery efforts; rather it enhances the overall backup process by easily integrating into existing tape management and backup procedures, avoiding the challenge of creating new backup processes.

Additionally, today’s high availability, global deduplication solutions protect business-critical data and can scale to meet growth requirements without requiring equipment upgrades. IT admins are charged with protecting business-critical data and need the straight facts on how this technology will strengthen the overall data protection plan. They cannot afford to base their decisions on false data and preconceived notions. In busting these deduplication myths, it is clear that deduplication is vital to a company’s data protection plan.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like