- Wilson Guo is co-chair of the OCP China Community and senior technology director at Inspur
- Dr. Han Wang is principal architect at Inspur
Digital transformation across industries is driving a cloud data center construction and innovation boom. But as many data center technology innovations focus on the cooling, power supply, and management of hyperscale data centers, not enough attention is being paid to the needs of small and medium-sized data centers supporting private and hybrid cloud users.
The OpenRMC Project with the Open Compute Project (OCP), led by Inspur and Intel with contributions from Microsoft and Wiwynn, introduces a rack management solution that integrates hardware and software that helps data centers improve construction efficiency, simplifies operations management, and enhances operational efficiency.
Open, Automated Operations and Maintenance Capacity
Operations and maintenance are an integral part of data center operations and are growing increasingly complex. As the intelligent era unfolds, the diversity and complexity of application loads in data centers are increasing, new technologies such as artificial intelligence and containers are being implemented, and computing resources are becoming heterogeneous and pooled. In addition to traditional CPUs, accelerator computing units such as GPUs and FPGAs are now playing an increasingly important role in server systems.
To improve the reliability and availability of their data centers, users are looking to automate deployment and inspection, conduct in-depth fault diagnosis, and receive intelligent alarms.
Meanwhile, growing CPU and GPU capacity is driving up server energy consumption.
Cooling and power supply energy consumption accounts for a considerable part of data center operational costs, putting enormous cost pressures on companies. Green and energy-efficient design is essential for boosting data center competitiveness and striking a balance between environmental and economic benefits.
But due to the difficulty of monitoring the performance and power consumption of servers in real-time and at a fine granularity, traditional data center operations have failed to achieve the desired energy efficiency.
OpenRMC enables much better and easier monitoring of power consumption. Aggregated power consumption of all the equipment can be reported in real-time, along with aggregated performance metrics. This data is necessary to accurately measure critical energy consumption and determine efficient computing resources.
From the above analysis, we can tell that almost 30 percent of power capacity of the rack is over-reserved for backup. With the accurate telemetry and power control function of OpenRMC, power utilization and rack density could be improved by 15 to 25 percent.
Automated data center operations are essential to reduce energy consumption and optimize server resource allocation. In recent years, OCP has made major advances in delivering higher computing density per unit space, reducing vendor lock-in through unified specifications, and quickly responding to unexpected application demands.
To achieve this, the design and delivery of a flexible and modular rack solution for data centers holds the key.
Due to limitations in terms of product, technology, and capabilities, the deployment of automated data center operations and energy conservation equipment is still in the early stages.
OpenRMC: A Promising Data Center Rack Management Solution
In response to the growing need for automated data center operations, greater system availability, and better energy efficiency, Inspur has initiated and led the establishment of the OpenRMC Project. The Project aims to lead the industry to provide a software and hardware based rack management solution through open source management features.
A crucial issue addressed by OpenRMC is enhancing openness and usability. During the operation of a traditional data center, each server node is the most important managed control unit. Only when each node can work stably and efficiently, the systems in the whole rack can be coordinated and utilized in order. The BMC on the server node is the key to managing each server. The BMC comes in the form of SoC, and through its own abundant IOs, BMC connects many sensors to the various subsystems and obtains the information to control the environment. OpenRMC uses the BMC of each node as the basic unit of management and control, supports IPMI and Redfish interface, and implements management functions such as remote power control, Serial over LAN, host node CPU and memory operating status monitoring, and hard disk LED on/off.
In terms of software and communication interfaces, in addition to supporting common IPMI interface standards and different commercial BMCs, such as iLO and DRAC, OpenRMC also supports OpenBMC open source management software architecture. This software architecture uses the Linux kernel to build the SoC system, and the application layer also uses similar modular software packages, so that the construction of the BMC management system uses a unified API, and the development and deployment of the BMC management function of a new device can be completed in a very short period of time.
Inspur has defined interface specifications for northbound management that targets all the equipment in the rack and contributed them to OCP. The move aims to promote seamless connection and effective communication between northbound presentation and southbound management within the OCP framework.
Intel and Microsoft are also actively promoting the innovation and application of OpenRMC. In 2014 Intel released the Intel RSD (Rack Scale Design), a reference design intended to promote the technology for resource pooling and flexible deployment in data centers to improve resource utilization. As one of the sponsors of the OpenRMC project, Intel has open-sourced the RSD rack management module and management APIs (RSD RMM REST API) and contributed them to the OCP OpenRMC project. It has also provided the reference code and methods for obtaining the parameters of key functions and components, such as chassis, power supply, and cooling.
Microsoft Azure represents one of the largest public clouds in the world and as an owner of hyperscale data centers and provider of cloud computing services, Microsoft has provided open source server standards and its Olympus server specs to the OCP community. It has shared its own experience in data center management with the community and proposed several different RMC hardware implementation methods. The company has also provided suggestions for the software modularization design of OpenRMC firmware, as well as examples of accessing the status of rack-level components, management and monitoring.
The code and hardware reference designs contributed by OCP Project members have greatly diversified the use cases of OpenRMC and innovated the automated operations ecosystem. The members have also provided an underlying platform and credit guarantee for the wide adoption of OpenRMC functions.
With OpenRMC, a rack management system, based on open source technologies, can be scaled up to help both large and small- and medium-sized data centers integrate heterogeneous equipment, realizing automated and fine-grained operations. In this way, data centers can reduce their IT operations costs, simplify management and improve efficiency.
Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Informa.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating.