Cold Storage in the Cloud: Comparing AWS, Google, Microsoft

As the volume of data companies generate and need to keep balloons, the top cloud providers have come up with a type of cloud service that may replace at least some portion of the market for traditional backup products and services. Cold storage delivered as a cloud service is changing the way organizations store and deliver vast amounts of information. The big question is whether cold storage can provide for better backup economics.

Amazon Web Services, Google Cloud Platform, and since April also Microsoft Azure now offer cloud cold storage services. Each has a different approach, so how do they stack up against each other?

Addressing the Data Deluge

Virtually all analysts are predicting that the cloud services market will continue growing and growing quickly. Gartner said recently that cloud will constitute the bulk of new IT spend this year. This will be a defining year for the space, as private cloud begins to give way to hybrid cloud, and nearly half of large enterprises will have hybrid cloud deployments by the end of 2017.

So how much data are we creating. Cisco estimates that global data center traffic is firmly in the zettabyte era and will go from 3.4ZB in 2014 to 10.4ZB in 2019. A rapidly growing segment of data center traffic is cloud traffic, which in 2019 will account for 8.6ZB of that projected 10.4ZB.

With Google and Amazon already in the cold storage market, Microsoft decided to join the game as well. In April, Microsoft announced the general availability of Cool Blob Storage – low cost storage for cool object data.

What is It For?

When Microsoft announced its Cool Blob storage in April it listed example use cases such as backup, media content, scientific data, compliance, and archival data. Essentially, any data that is seldom accessed is a good candidate for cool (or cold) storage: legal data, tertiary copies of information, data that is required to be retained for longer periods of time due to compliance, and archival information are all good examples. So what sets cold storage apart from more traditional storage options?

Let’s start with a definition:

Cold storage is defined as an operational mode and storage system for inactive data. It has explicit trade-offs when compared to other storage solutions. When deploying cold storage, expect data retrieval times to be beyond what may be considered normally acceptable for online or production applications. This is done in order to achieve capital and operational savings.

Ultimately, it means working with the right kind of cold storage backup solution that specifically fits your business and workload needs. The reality is that not all cold storage architectures are built the same. Keeping this in mind, let’s examine the three big ones.

Google Nearline: Google announced its Nearline archival storage product in 2015 and it was quickly seen as a disruptive solution in the market. Why? There was the direct promise of a very quick (only a few seconds) retrieval time. When compared to market leader AWS Glacier, this is pretty fast. According to Google, Nearline offers slightly lower availability and slightly higher latency than the company’s standard storage product but with a lower cost. Nearline’s “time to first byte” is between 2 and 5 seconds. Which, when you look at other solutions, can be seen as a real game-changer. However, there are some issues.

One is that Google Nearline limits data retrieval to 4MB/sec for every TB stored. This throughput scales linearly with increased storage consumption. So, if you find yourself needing to download massive amounts of data, you might need to wait around a bit. Still, a feature called On-Demand I/O allows you to increase your throughput in situations where you need to retrieve content from a Google Cloud Storage Nearline bucket faster than the default provisioned 4 MB/s. Two things to keep in mind:

On-Demand I/O is turned off by default.
On-Demand I/O applies only to Nearline Storage and has no effect on Standard Storage or Durable Reduced Availability Storage I/O.

Overall, Google promises a low-cost, highly durable and highly available storage service for data archiving, online backup and disaster recovery. Data is available instantly, not within hours or days. With a three-second average response time and 1 cent per GB/month pricing, Nearline gives you solid performance at a low cost. Furthermore, it lets you store “limitless” data and get access rapidly through Google Cloud Platform Storage APIs with an approximately three-second response time for data retrieval.

Finally, some cool aspects here are the features provided. Aside from On-Demand I/O, you also have transfer services. This basically allows you to schedule data imports from places like Amazon S3, HTTP/HTTPS sites, and on-premise locations. This process can be automated for complete lifecycle management.

AWS Glacier: As one of the first and leading cold storage solutions, Glacier was built as a secure and extremely low-cost storage service for data archiving and online backup. Customers are allowed to store large or small amounts of data. According to Amazon, pricing can start at as little as $0.01 per gigabyte per month, a significant savings compared to on-premises solutions. To keep costs low, Glacier is optimized for infrequently accessed data where retrieval times of several hours are suitable. Your experience with retrieval and delivery of say 1TB would be different between Glacier and Nearline. Glacier would have that storage object available in approximately three to five hours. Four hours into their download, a Google Nearline customer would be 5 percent complete downloading their 1TB of data with approximately 69 hours to completion.

Within the Glacier environment, data is stored in "archives." An archive can be any data, such as photos, video, or documents. You can upload a single file as an archive or aggregate multiple files into a TAR or ZIP file and upload as one archive.

A single archive can be as large as 40TB. You can store an unlimited number of archives and an unlimited amount of data in Amazon Glacier. Each archive is assigned a unique archive ID at the time of creation, and the content of the archive is immutable, meaning that after an archive is created it cannot be updated.

From there, Amazon Glacier uses "vaults" as containers to store archives. You can view a list of your vaults in the AWS Management Console and use the AWS SDKs to perform a variety of vault operations such as create vault, delete vault, lock vault, list vault metadata, retrieve vault inventory, tag vaults for filtering and configure vault notifications. You can also set access policies for each vault to grant or deny specific activities to users. Under a single AWS account, you can have up to 1000 vaults.

Once your data is in the vault, administrators get the chance to use some granular control features including:

Inventory
Access controls
Access policies
Vault locking (write one read many controls, for example)
Audit logging
Integrated lifecycle management
High-level and low-level AWS API integration
Data Protection
Data Reliability

Microsoft Cool Blob Storage: The launch of the Cold Blob storage service in April was a catch-up move by Microsoft.

The Azure cool storage tier is optimized for storing data that is infrequently accessed and long-lived. Costs for the Cool Blob Storage range from $0.01 to $0.048 per GB per month, depending on the region and the total volume of data stored. The comparable range for the “Hot” Blob storage tier, which is for frequently accessed data, is $0.0223 to $0.061 per GB. Under some circumstances, the savings from storing some data in the Cold tier could be more than 50 percent.

Here’s an important note: Keep an eye on charges and billing; things may still be changing. In this blog, Microsoft points out that in order to allow users to try out the new storage tiers and validate functionality post launch, the charge for changing the access tier from cool to hot will be waived until June 30th 2016. Starting July 1st 2016, the charge will be applied to all transitions from cool to hot.

Microsoft highlighted that you will be able to choose between Hot and Cool access tiers to store object data based on its access pattern. Some capabilities to keep an eye on:

API integration (but only with other existing Blob storage offerings)
Security
Scalability
Multi-region distribution
99% availability (the Hot tier offers 99.9%)

Few More Words of Caution

Nearline, Cool Blob Storage, and Glacier may be powerful and affordable, but end-to-end integration and management can still be a challenge. Management capabilities around backup and storage will be critical.

AWS Glacier, for example, allows customers to set policies that only allow users to retrieve a certain amount of data per day. Furthermore, its users could also set a policy for retrieval that falls within the free tier. When compared to Google's Nearline, the same sort of granularity seems to be missing. As for Microsoft, Cool Blob Storage is great as long as you have your data stored in Microsoft's cloud to begin with.

There’s no clear winner here. It will depend on your specific use case. As you build out your own cold storage architecture, make sure to create an environment based on integration best practices. This means understanding what kind of data you’ll be storing, retention policies, pricing, and of course how quickly you’ll be needing the information during a restore.

Comments

Plain text