The cloud brings a technological shift to IT landscapes and their backup strategies and implementations. Relying on well-established on-prem backup rules and patterns is convenient but risky. These backup strategies became popular because they solve critical challenges in the on-prem world efficiently. However, there is no guarantee that these solutions make sense, fit, and are cost-effective when deployed unchanged in a cloud setup.
Let’s look at two widely used on-prem backup strategies – the grandfather-father-son pattern and the 3-2-1 pattern – and explore their relevance in the world of public clouds.
The 3-2-1 Backup Strategy
One of the best-known strategies in the backup area, the 3-2-1 pattern, consists of three elements:
- Keep (at least) three versions of your data.
- Store backups on (at least) two types of backup media.
- Keep (at least) one copy offsite.
Rather than just reimplementing this strategy in a cloud design, architects should ask themselves: Why were the pattern’s rules crucial in the on-prem world? What did engineers and architects try to achieve with them?
Figure 1 reflects these aspects: The left column contains the pattern’s rules. The middle column highlights the needs and requirements which lead to the pattern’s rules. Finally, the right column matters for the cloud age: It highlights the changes in public clouds such as Amazon's AWS, Google's GCP, or Microsoft's Azure.
For decades, service providers and enterprises have hosted their servers in data centers built at carefully selected locations. They want to reduce the risk of environmental hazards, such as floods or landslides. In contrast, many small and medium public or private organizations and SMEs might still place their servers in the basement of their office building.
A fire, an earthquake, or a truck running into the building – all of these events potentially destroy the complete building, including the server room. Then, the recommended “one offsite backup copy” becomes the only lifesaver. But how relevant are such backups in the cloud age?
The concept of “onsite” versus “offsite” changes in the cloud age, but one risk remains the same: No matter how carefully you select the location and how seriously you take physical security, a plane can crash into the building, or a fire (and the subsequent firefighting) might destroy the data center. Having your production workload in one data center and at least one backup in another still makes sense. And features such as AWS Cross-Region Replication or geo-redundant storage in Azure enable even small businesses to prepare for regional or country-wide disasters.
The need for the “two types of backup media“ rule is less obvious. But let’s remember the times when companies stored their backups on tapes or optical disks and look at what could go wrong. First, an engineer might decide – for convenience – to keep all backups on one media. If this media has a defect or is lost, all backups are gone. One media is obviously too risky, but what is the risk when using, e.g., three tapes, each produced to work for years or decades?
For explanation purposes, we define the probability of losing one backup tape in a year as 0.1%, and assume we have three backups. Then, the likelihood of losing all three is 0.1%*0.1%*0.1%=0.001%. If that is too high, add three more backups. The probability of losing all six within a year is 0.000001%. In comparison, the likelihood of lightning striking you once in your life is 0.0065%. Given these probabilities, would you invest in reducing the risk of losing three or six tapes at the same time?
It is a trick question because it builds on a common mistake: assuming events to be unrelated that are not (dependent versus independent variables in statistics). If one tape fails due to a material defect, all tapes produced on the same machine in the same factory probably have the same defect. If you order three together, they probably all have the same flaw. Thus, requiring two types of backup media is a convenient and easy-to-implement way in the on-prem world to reduce the risk of all media getting corrupt simultaneously. But how does this rule help with cloud backups?
The answer is: It’s complicated. The durability service-level-agreement for S3 object storage is 99.999999999% in a year. But, since this is a service-level agreement (SLA), you have to trust the service provider, and you cannot validate the technological design and implementation. If cloud providers miss the SLA, your company might go bust, but not the cloud provider. The cloud provider might offer you a small discount on your cloud bill, but this won’t save you if your data is gone. My personal opinion: the “two media” rule is nearly impossible in the cloud, but it isn’t quite as necessary if you can trust the SLAs.
The ”3” of the “3-2-1” relates to keeping three versions of your data; for example, by having two backups from different points in time. A popular implementation variant overfulfilling this requirement is the second well-known backup strategy: the grandfather-father-son pattern.
Understanding the Grandfather-Father-Son Strategy
A typical implementation of this data backup pattern:
- Makes one backup per week (the father)
- Performs daily backups (the son)
- And keeps one backup per month (grandfather)
Figure 2 illustrates how such a schedule could look with the weekly “father” and monthly “grandfather” data backups on Monday night, as well as the daily “son” backups.
The grandfather-father-son backup strategy is a sophisticated concept covering various scenarios in which companies need backups. The first scenario is about operational mistakes. In theory, admins never delete productive databases by mistake.
Similarly, engineers should carefully test configuration changes on test systems before applying them to production systems. But if reality beats theory, engineers have to roll back servers and components quickly to a state before the mess happened —and the solution is to employ a recent update, such as one from the day before. The “son” backup meets this requirement.
When ransomware gangs or governmental-sponsored actors infiltrate an IT landscape, they might remain undetected for days or weeks. So, yesterday’s backup doesn’t help as much. Instead, engineers have to roll back configurations and applications (not business data) to a state weeks ago before the infiltration. For this purpose, the “grandfather” backup is ideal.
These two sample scenarios are essential for the cloud world as well. What changes are the new data backup features available in the cloud, such as object versioning or point-in-time recovery.
Point-in-time recovery (PITR) has already been available for various databases for a while, but public clouds accelerate the adaption and integration into corporate backup strategies. PITR is a time-travel tool allowing admins and engineers to roll back a database state to any given time, e.g., within the last week or month (Figure 3, left). If available and active, PITR makes daily and weekly backups obsolete. Depending on the concrete needs, risk appetite, and features of a specific cloud service, companies need the “grandfather” backup in addition to PITR.
Object versioning is another concept, helpful especially for object storage. When users, scripts, or applications replace an object with a newer version, the cloud keeps the old version for a defined period (Figure 3, right). During this period, there is no need for additional backups (e.g., daily or weekly), though architects might have to configure long-term backups extra.
Traditional Backup Rules Still Applicable Today
It was never easier to configure backups than in the cloud. A few clicks and you’re done – and yet, enormous bills for storage consumption might ruin your business case. To prevent costs from skyrocketing, architects must consider the cost-benefit relation when defining the cloud backup strategy. Keeping your monthly backups for a year means you have twelve versions. There are four more versions for keeping the weekly backups for a month. Plus, if you perform full daily backups and store them for a week, these are seven additional versions. So, the backup storage volume is 21 times higher than the one for production.
Established, traditional backup rules such as 3-2-1 and grandfather-father-son are still relevant for today’s cloud architectures. Their relevance comes more from the requirements behind the strategies rather than their exact rules and technologies. However, clouds also come with new features, including redundancy and cloud backup features most medium and small companies would never have dreamed of some years ago.
Thus, the task for cloud engineers is not to implement the patterns as before. They should take the still-valid old requirements and address them with today’s cloud features. Thereby, IT departments can define and implement cost-effective and forward-looking cloud backup strategies.