Marc Crespi is the Vice President of Product Management for ExaGrid Systems, Inc. where he is responsible for managing product operations.
Much confusion can be prevented in the world of disk-based backup, if customers come to understand that data deduplication does not come in one form. Big differences exist between the two architectural approaches: post-process deduplication with a grid architecture and inline deduplication with a fixed controller/disk shelf architecture.
Differences to Consider When Comparing Disk Backup
When IT organizations are evaluating disk backup with deduplication, many people are surprised to learn that the differences in architecture between appliances can make a big difference in the economics of your backup.
Though many factors should be measured during system evaluation, three main considerations highlight the architectural differences:
- Highest performance for the shortest backup window and fastest full system restores
- Fast backup performance and short backup windows maintained as data grows
- Cost-effective modular scalability as data grows
Highest Performance for Shortest Backup Window
Post-process deduplication in a grid with full servers offers the fastest backups because the system deduplicates data after it has landed to disk and because full servers bring CPU, memory, disk and Gigabit Ethernet. Post-process also enables the fastest restores because the disk backup system keeps a full copy of the most recent backup available in high-speed cache for immediate recovery. In contrast, with inline deduplication, the disk backup system performs the dedupe process before data is fully protected on disk, and for a full system restore the data must first be “rehydrated.”'
Performance Maintained as Data Grows
Grid architecture solutions maintain high performance as the disk backup system scales because you add full appliances including processor power, memory, bandwidth and disk matched to the amount of backup data. When the system needs to expand, additional full appliance nodes are attached to the grid, thereby maintaining all aspects of performance as data grows. With the inline (controller/disk shelf) model, all of the processing power, memory, and bandwidth are contained in the controller, so when data increases and IT staff expands the system by adding only disk shelves, backup performance degrades.
Control Costs At Scale
Disk backup with deduplication systems based on a grid architecture are the most cost-effective to scale because as data grows, full servers can be seamlessly added to the grid in modular increments as needed without replacing existing nodes. Grid capacity is typically load-balanced automatically, which maintains a virtual pool of storage that is shared across all nodes. This contrasts the controller-disk shelf model, which adds disk to a fixed-capacity controller as data grows resulting in an expansion of backup windows. In this scenario, the controller must eventually be replaced via costly forklift upgrades to the next larger controller.
Deduplication Defined; Choose Your Data Center Path
In summary, it is essential to identify the key architectural differences between systems and understand that post-process/grid and inline/fixed controller architectures are not identical. When determining what is right for your data center, keep in mind the performance needed to meet your business’ needs based on the size of your existing backup data today, the need for maintaining fast backups as data grows, and the total cost of ownership of the solution as it expands over its expected lifetime.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.