Software-Defined Power: The Path to Ultimate Reliability

4 comments

Clemens Pfeiffer is the CTO of Power Assure and is a 25-year veteran of the software industry, where he has held leadership roles in process modeling and automation, software architecture and database design, and data center management and optimization technologies.

Clemens Pfeiffer Power AssureCLEMENS PFEIFFER
Power Assure

About half of all service outages in data centers today are caused by power problems, and that percentage is expected to increase as the electric grid struggles to meet a growing demand on an aging infrastructure. Part of the reason for this shift is that hardware has become remarkably reliable, and the virtualization of servers, storage and network components, or the so called “Software-Defined Data Center,” has made applications immune to single points of failure. Power problems, by contrast, are only partially addressed by the uninterruptible power supply (UPS) and backup generator.

To enhance their business continuity and disaster recovery strategies, most organizations now operate multiple, geographically-dispersed data centers. While this investment is made primarily to protect against catastrophic events caused by major natural disasters, the arrangement can also afford greater immunity from power problems, whether caused by weather or disruptions on the grid.

What is Software-Defined Power?

Software-Defined Power is emerging as the solution to application-level reliability issues being caused by power problems. Software-Defined Power, like the broader Software-Defined Data Center (SDDC), is about creating a layer of abstraction that makes it easier to continuously match resources with changing needs. For SDDC, the resources are the servers, storage and networking equipment, and the need is application service levels. For Software-Defined Power, the resource is the electricity required to power (and cool) all of that equipment, but the need is exactly the same: application service levels.

With Software-Defined Power, overall reliability is improved by shifting the applications to the data center with the most dependable, available and cost-efficient power at any given time. Software-Defined Power is implemented using a software system capable of combining IT and facility/building management systems, and automating standard operating procedures, resulting in the holistic allocation of power within and across data centers, as required by the ongoing changes in application load.

It’s About the Applications

Once configured with the service level and other requirements for all applications, the Software-Defined Power solution continuously and automatically optimizes the resource allocations as it shifts loads between or among data centers. Adding power to the already existing software-defined computing, storage and network components of an application environment makes it possible to abstract applications fully from an individual data center and its power dependency. This is what enables the shifting and shedding of application capacity across multiple data centers by adjusting the IT equipment and critical facility infrastructure required at each, resulting in the maximum possible application-level reliability at the lowest operating cost.

Not only does shifting loads between data centers help increase reliability by affording greater immunity from power problems that cause unplanned downtime, it also creates wider windows for the planned downtime required for routine maintenance and upgrades within in each data center. This makes it easier to operate applications 24×7 with no adverse impact on either availability or performance from power-related issues.

Follow-the-Moon Strategies

In addition to the increased reliability, Software-Defined Power also pays for itself by minimizing energy spend and enabling participation in lucrative demand response programs. Power is the most dependable and available at night, which is also when rates for electricity are normally the lowest. So shifting the load to “follow the moon” can afford considerable savings.

Shifting load to a distant data center also enables shedding that load locally. A best practice in Software-Defined Power, therefore, is to power down the servers until they are needed again. This same ability to de- and re-active servers can also be used to dynamically match capacity to load within a single data center on a regular schedule or in response to changing application demand.

Because utilities pay exorbitant rates for wholesale energy during periods of peak demand, they are willing to pay commercial and industrial customers handsomely to reduce usage during these peaks. Software-Defined Power enables data centers to participate in these demand response programs without adversely impacting on application service levels. Organizations can even go one step further: By knowing about potential grid issues, IT and facility managers can take preventive action to shift applications to another data center in advance of any power problems.

The combination of paying less for energy and wasting less to power (and cool) idle servers (including during demand response events) can result in savings of over 50 percent. And considering that the operational expenditure for energy alone exceeds the capital expenditure for the average server today, the electric bill for a full rack of servers can be cut by as much as $25,000 every year.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Add Your Comments

  • (will not be published)

4 Comments

  1. This is an interesting take (though I will admit I kvetched at the introduction of another software-defined term). Do you see this kind of thing evolving where applications are moved indiscriminately? Or is there some kind of abstraction that weights some applications over others in terms of tolerance to moves or criticality? I ask because a lot of the software-defined-whatever talk has been focused on lower-level protocols rather than higher-level abstractions. I find your use case a potentially interesting one for the latter, but wondering if I am reading too much into it or overthinking it. I work on the vendor side and have a predisposition for abstractions, so I admittedly see ghosts sometimes :) -Mike @mbushong

  2. Michael, thank you for your questions. We don’t move applications indiscriminately, we move them across data centers to minimize risk from potential power issues inside and outside of the data center. As you rightly state, the application service levels can never be missed so priorities and preferences have to be given to more important applications, however, details on power pricing, forecasts, availability and real time measures are indicators for potential problems. When there is a serious power failure at a data center no abstraction layer in that data center is helpful in keeping the application layer up. Therefore, we help enterprises to distribute their application load across data centers continuously based on application requirements and tiers, power availability, reliability and also power cost where applicable

  3. Is it not an oxymoron to mention software in the context of reliability? This has been the focus of stringent automotive "functional safety", IEC26262 and ASIL/4/D processes. There is undoubtedly a huge benefit for optimization and risk mitigation through the intelligent implementation of load balancing towards the lowest risk computing resource for the most sensitive application. More on the power infrastructure side, the industry has recognized that software is also the Achilles heel of a power system: - Boeing spent $2B out of $4B for software validation for its 777 program. - 30% of unexplained PV solar inverter outages have been chalked to software (source: a leading solar ESP during a Sandia work shop) - our case study discovered a anomaly of a power inverter, related to a control loop issue, and traced to memory or firmware deficiency. A catastrophic failure eight months later caused $15k in damages. Due to the fast failure of power electronics in sub-micro-seconds, condition-based maintenance processes for electronics have to gather and process demanding amounts of health data, unrealistic for a server-based application. Our proposal is the integration of an independent real-time processor inside a power supply subsystem as the watch-dog, Open for licensing.