Data Tiering: Achieving Sustainable Observability Without Compromising Effectiveness

Observability makes a wealth of app and system health data actionable. It allows enterprises to accurately fix performance-impacting issues arising in the data center, and ideally identify and resolve these situations before they have a huge impact on the business.

With the average cost of downtime running at approximately $5,600 per minute, it’s no wonder many organizations are investing heavily in observability practices. According to recent estimates, organizations with advanced observability practices are four times more likely to resolve instances of unplanned downtime or serious service issues in just minutes, versus hours or days.

Additionally, 90% of IT professionals believe observability is important and strategic to their business, while 91% see observability as vital at every stage of the software lifecycle. Moreover, organizations that have mastered observability produce an average of 60% more products or revenue streams from development teams than observability beginners.

Too Much of a Good Thing?

Despite the benefits of observability, there’s a challenge that is growing more onerous every day. The increase of microservices and digital-first customer experiences is creating a massive growth in observability data, including logs, metrics, and traces. In many ways, the situation is like an overflowing bathtub: too much data gives rise to uncontrollable chaos, with the paradoxical effect of overwhelming DevOps and site reliability engineering (SRE) professionals, instead of helping them.

For the typical data center, log data volumes have seen a five-fold increase over the past three years. As a result, many observability practices are becoming prohibitively expensive. According to one recent survey of DevOps and SRE professionals, more than 90 percent expect their observability initiatives to come under more intense C-level scrutiny within the next year.

A Game of Russian Roulette?

DevOps and SRE teams who depend on observability initiatives to keep their data center-based apps and systems running fast and reliably feel as if they’re facing an impossible conundrum. If they keep all their data – the vast majority of which is never needed or used – they will blow through their storage and analytics budgets. Indeed, over 90% of DevOps and SRE professionals have reportedly experienced overages or unexpected spikes in observability costs at least a few times per quarter, if not more.

To combat these issues, nearly all DevOps and SRE teams regularly limit log ingestion – often indiscriminately and randomly – as a mechanism for reducing costs. However, the danger with this approach is that these teams may be randomly discarding the very data needed to fix a problem, or valuable data that could alert to a growing hot-spot and therefore provide a window for proactive remediation. Remember: data grows exponentially richer, the more of it you have.

The consequences of this “data down the drain” approach can be severe, including increased risk or compliance challenges; losing out on valuable insight and analytics; and failure to detect a production issue or outage. This approach can also be quite contentious, giving rise to conflict among many DevOps and SRE teams over which datasets should be kept and which can be discarded.

Data Tiering: A Happy Medium

Fortunately, there is a technique that allows DevOps and SRE teams to keep all their valuable data while also retaining a lid on excessive storage costs. And that approach is multi-vendor data tiering.

Given the explosion in observability data volumes, it’s simply not sustainable to store all data in an expensive premium storage tier, like a log data index. Instead, a combination of best-of-breed tools can streamline and move data to its optimal storage destination based on its use case (e.g., real-time analytics, ad hoc queries, compliance) and desired cost efficiencies (e.g., hot and searchable index, object storage).

In this way, DevOps and SRE teams can continue to have all their data at their disposal without spending inordinate amounts on storage and analytics. This painful compromise no longer needs to be made.

Reducing Data Costs

Today’s data center and application empires are growing increasingly vast and complex, generating more data than most DevOps and SRE teams can handle. It is, however, safe to say that not all data is of equal value, and teams shouldn’t be expected to treat (and pay for) all data in the same way. Reducing data ingestion and leveraging best-of-breed tools to properly classify data can drive huge cost savings.

Over the next year, we expect data tiering will be widely adopted, with buy-in from the engineer to executive levels. In addition, we expect to see an uptick in demand for usage reporting as DevOps and SRE teams maximize their observability data and use what is most valuable, while also utilizing lower-cost storage tiers.

Ozan Unlu is CEO of Edge Delta.

Comments

Plain text