The Case for Cloud Service Management as a Service

The performance of a data center is judged by the stability, integrity, and ease of use of its applications. People tend to say the heart of a data center is data. But that’s a bit like saying the heart of a person is, for instance, blood — rather than, much more obviously, the heart. You can’t really judge whether your data is doing its job, when your applications are performing poorly.

As data center architectures, server technologies, and applications design continue to evolve — in many ways, independently from one another — the solution to a comprehensive approach to managing cloud application performance, could easily drift further away from us. Here’s what’s happening:

· Enterprise customers are redistributing their IT assets, bringing their data lakes and streams back on-premises, delegating archival and relational database functions to cloud services, and deploying real-time and critical tasks to a new class of edge data centers.

· Servers are becoming networked systems unto themselves, adding multi-pipeline co-processors in the form of FPGA and GPGPU accelerators, and greatly expanding memory and solid-state storage for data processing tasks that used to consume entire racks.

· Software is metamorphosing from monolithic programs into swarms of functions, each of which is scalable unto itself, and all of which are marshaled by a new generation of task orchestrators. Each of these orchestrators manages multiple server nodes as single compute clusters.

These three trends could easily continue without any help from one another. What’s required at this point is for some sort of virtual platform to offer the semblance of contiguousness and continuity, for those people tasked with managing performance.

Accomplishing this means settling on a common definition of “performance.” Specifically, data center managers, IT operators, and software developers need to see pertinent and meaningful metrics that they can actually use. A tremendous number of factors all contribute to the final measurement of page load time. That final number cannot be mathematically factored into a neat, tidy list of common multiples.

Put another way, the people who will get the credit and/or blame for performance, need to see metrics that pertain to values they can directly change or influence with the work they do.

A cloud-based service management platform may be the ultimate goal here: a way to coalesce all the metrics from even the most widely distributed segments of the data center, into a common framework residing on infrastructure independent from customer ownership. (So if the customer’s own assets go down, it’ll have a better idea of why.)

We’re starting to see the first agreements between benefactors in the performance space, that could push trends in the right direction for a cloud SMP:

· New Relic, maker of an application performance manager (APM) platform, developed an integration that enables applications and services that are partly deployed on Amazon AWS, and partly on customer premises, to be monitored on a single plane of reference, with AWS billing information built-in.

· Microsoft recently made headway with its overhaul of role-based access control (RBAC) for Azure. Under the new system, resources provisioned to the cloud are given “owners,” and the metrics that apply to performance for those resources are automatically attributed to those owners. It’s a vast simplification of the Windows Server-based approach that had already been cobbled together. And with customers deploying Azure Stack on-premises, the system can conceivably measure performance for customer-owned resources as well.

· Open source engineers, led by Vapor IO, have coalesced on a novel infrastructure monitoring solution based on ultra-low-cost, dedicated hardware. OpenDCRE (Data Center Runtime Environment) runs in a Linux container deployed on a Raspberry Pi box attached to a server rack. It communicates with servers by means of a direct connection to the servers’ power line through their bus bar. While today the framework is being used to transmit hardware performance data to a cloud-based application, it’s not inconceivable that it can be expanded to include deeper-level metrics on applications and services collected by server-based agents.

If projects like these were to come together, the product could be a kind of open source framework for monitoring the health of both systems and their applications in real-time. Colo providers could offer customer portals that collect these metrics together and provide live analytics.

What such a coalition would need to come to fruition, is incentive. If these three data center transition trends continue, such incentive could come from the realization that coalition may be far more difficult tomorrow than it would be today.

Comments

Plain text