One of the key benefits of cloud computing is that it offers data storage architectures that have the potential to be very high in flexibility and very low in cost.
But the keyword there is potential: Whether the way you store data in the cloud actually saves money and provides flexibility depends, to a large extent, on important data architecture decisions that you make early in your cloud migration journey.
To prove the point, this article unpacks the cost implications of different data architectures in the cloud and offers guidance on which cloud-based data storage, management and processing strategies yield the best ROI.
What is data architecture?
Data architecture is a catch-all term that describes the various ways that operators can access and analyze data. It encompasses practices such as (but not necessarily limited to) the following:
- Data storage.
- Data transformation.
- Data analytics.
- Data quality management.
- Data retention and rotation.
- Data backup and recovery.
The techniques and tools you use to handle these requirements, as well as how you integrate those techniques and tools, forms the basis for your data architecture.
Data architecture's role in cloud computing costs
The way you architect your data will affect your bottom line in any type of IT environment, since the hardware and software required to store, process and manage data comes at a cost.
However, in the cloud, data architecture has special implications for overall cost, due to the unique nature of cloud services and billing models. The additional costs for cloud-based data architecture include:
- Egress fees, which are charges cloud providers impose when data moves out of the cloud. (Most providers don't charge for data moving into their cloud platforms.)
- Data request fees, which you have to pay when interacting with data in some cases. For example, Amazon's S3 data storage service charges fees for actions like copying data.
- Early deletion costs, which apply to some cloud storage services under certain configurations if you delete data before an agreed-upon period. For instance, the Glacier tier of Amazon S3 storage has early deletion fees.
- Data processing and analytics costs that accrue when using cloud-based services. Usually, these costs are determined based on how much data you ingest into the tools.
For the most part, these fees don't apply – at least not in a direct way – with an on-premises environment. On-prem, you don't have to pay for data egress or early deletion, for example.
Cost-optimized cloud data architectures
It would be great if there were a simple set of rules to follow when designing a cloud data architecture that ensured you never pay more for data storage, processing, and management than necessary. Unfortunately, given the widely varying data requirements of different organizations, there is no one-size-fits-all approach to cost-optimizing your cloud data architecture.
But there are some general best practices that can help guide organizations toward lower-cost cloud data management:
- Understand cloud data fees: Perhaps the single most important step toward reducing data costs in the cloud is to understand the complex fee structures that apply to the data storage and analytics services you use. It can be easy to overlook or underestimate fees like egress costs, since on a per-gigabyte basis, the fees are quite low. But they can add up, and you want to know what you'll pay before your bill arrives.
- Consolidate data: In general, the less data you store and process, the lower your costs will be, since most data fees in the cloud are based on volume. Merging distinct data sets and removing redundant data can therefore help you to save money because it reduces the overall size of your data.
- Set data retention policies: The low cost of cloud data storage services can make it tempting to retain data in the cloud forever, or to perform periodic, one-off deletions. But that's a mistake; most data should be systematically deleted when it's no longer needed. To ensure that you never pay for data storage longer than necessary, establish data retention policies that define how long data needs to be retained for. You can also use tools like S3 Lifecycle policies to remove data automatically after its retention period has ended.
- Compare cloud data analytics services: The cloud-based data analytics market burgeoned over the past decade. Not only does each major public cloud now offer a lineup of different analytics tools, but various third-party providers can process and analyze your data, too. Before defaulting to whichever data analytics service is the most readily accessible, compare the features and prices of different vendors to make sure you're getting the best deal. And don't be afraid to go multicloud if you store data in one cloud but can get a better data analytics service from a different cloud. (Remember, though, that you'll be paying egress fees to move data between clouds.)
Everyone wants to minimize the cost of working with data. But the complexity of cloud tools and billing models – and the many different services available for working with data in the cloud – make it less than obvious in many cases to determine which cloud-based data architecture will yield the best performance at the lowest cost. Still, by carefully assessing factors like easily overlooked data processing fees, and by taking a systematic, automated approach to cloud data management, you can work toward making your data architecture as cost-efficient as possible.