In a large heterogeneous data center, a management application helps you manage the data center assets by maintaining inventory, periodically collecting the health statistics, and providing incident management methods. One of the major functionalities of one-to-many (1xN) management applications is collecting health statistics of multiple servers in the data center. The management applications use SNMP, WS-Man, or REST APIs to collect data from multiple devices of the server. Commonly monitored devices are—sensors, storage devices, power supply units (PSUs), temperature indicators, and cooling fans. iDRAC provides a component-level health status and a cumulative health status called Rollup status. The Rollup status provides an overview of the subsystem and the overall system indicated by the following Infographics:
While the cumulative health status or aggregation of the individual component Rollup statistics of all the devices of a server is represented as Global Rollup status, in the 14th generation PowerEdge servers, Dell EMC has introduced new methods for health monitoring and reporting. These methods report the individual statuses of devices, their aggregated health, and the reasons for failure. For any health change, iDRAC logs the Lifecycle Controller event and error messages. For more information about using event and error messages, see the Event and Error Message Reference Guide for Dell EMC PowerEdge Servers.
The rollup status of a device is derived by considering the health statuses of components in the server under consideration. The extreme severity level of a component is assigned to the overall health status of a server. For example, a server has a PSU in Warning state, but also has a fan in Critical state. Therefore, the Rollup health status of the server is considered to the extreme severe state which is Critical.
Global Rollup tree structure
In iDRAC that is factory-installed on the 14th generation PowerEdge servers, the GUI displays the rollup health with symbols on the System Summary page.
Figure 1: iDRAC System Health overview