This post was written by Sathish D of the Dell OpenManage Connections team.

OVERVIEW

HP Operations Manager (HPOM) supports scheduled task policy, which you can use to invoke external applications from the HPOM console. Dell Smart Plug-in (SPI) has a scheduled task policy, which retrieves the device health status and generates corresponding status messages in the HPOM console.

This post explains the Periodic Health monitoring of Dell Servers, Dell Remote Access Controllers (DRAC’s) and Dell Chassis using Dell SPI Scheduled Status Poll Polices with HP Operation Manager for Windows.

 Periodic Health Monitoring Scheduled Status Poll policies for Dell servers, DRAC’s, and Dell Chassis in HPOM console.

PREREQUISITES:

           Before scheduling status poll policies, complete these prerequisites

  • Configure Simple Network Management Protocol (SNMP) and enable support for Dell servers, DRAC’s, and Chassis.
  • Configure Web Services- Management (WS-MAN on ESXi Servers.
  • Install OpenManage Server Administrator (OMSA) in Windows and Linux Servers, and install supported Dell OpenManage (OM) - bundles on ESXi servers.
  • Schedule the Dell Hardware Auto-grouping policy in the HPOM console to classify Dell servers, DRAC’s, and Dell Chassis under the Dell groups.

Reference Documents:

  • For SNMP configuration on Windows and Linux Servers,               

<ADMIN NOTE: Broken link has been removed from this post by Dell>

MONITORING DELL SERVER’s IN HPOM CONSOLE:

Once the Dell Hardware Auto-grouping Policy is completed, the policy identifies the Dell servers that were discovered (in-band or out-of-band through iDRAC7 with available Licenses), and then creates hierarchies on both the Nodes and Services in HPOM console.

  • Node Hierarchy:  Classification based on type of Dell servers (either Monolithic or Modular), representing the object with the Service Tag for the node
  • Service Hierarchy: Classification based on type of operating systems installed in the server, representing the object both with the node’s service tag and its global health status.

Hierarchical representation of classified Dell Servers in Service Hierarchy:

(Both in-band and Out-of-band Management through iDRAC7 devices)

 

DEPLOYING THE DELL SERVER SCHEDULED STATUS POLL POLICY:

The scheduled task policy “Dell Server scheduled Status Poll” is used to monitor the overall system health status of Dell Servers in the HPOM Console, both in-band and Out-of-band with iDrac7 devices, as well as bare metal devices via iDrac7 devices. The default interval of this policy is every day 2:00 A.M. The interval can be changed to custom value as required; policies are deployed via the Management Server.

Once the policy is run, the health status of the device is queried through the communication protocol and its corresponding status with associated severity is displayed in the HPOM console. The health status message is also associated with the device, classified under Dell groups in both the Node and Service hierarchies.

The schedule task policy acknowledges the health message and posts the current health status message for the classified Servers and Integrated Dell Remote Access Controller (iDRAC) 7 devices in the Active Message Browser of the HPOM console. The latest health status of the servers and the iDRAC 7 devices is always displayed in the HPOM console.

The Scheduled Status Poll policy generates health messages with three different severities in the HPOM console.

     Message Association in Service Hierarchy:

Once the Dell Servers Scheduled Status Poll policy is run, the policy retrieves the overall health status of the classified Dell servers, and iDRAC 7 devices. The retrieved health status is mapped with its corresponding health messages (Normal, Warning, or Critical).The health message is associated with the server’s and iDrac7 devices (child node: Global System Status) and are seen on the Active Message Browser of the HPOM Console. The message severity of the child node is propagated to the parent Node in both the Node and Service Hierarchies

Service Hierarchy: Only the Global System Status message with its corresponding health severity will be associated and updated for the node. The health severity will propagate to the device parent group.

Node Hierarchy:  In Node hierarchy, SNMP Trap Messages and Health messages are associated with the Server node; the worst case message severity is propagated to its parent Node group.

Health Status Message Association for Server Node in Service Hierarchy:

TROUBLESHOOTING STEPS:

When the device Global health is warning or critical status, follow these steps to troubleshoot the issues:

-       Review the outstanding messages in the Active Message browser of the device.  If any issues exist, resolve them as per instructions in the message browser.

 -      Launch 1:1 console  for further troubleshooting,

  • Dell Servers (in-band)
    • Dell OpenManage Server Administrator (OMSA) console launch for all Dell Windows and Linux OS servers (in-band)
    • Dell OpenManage Web Server Administrator console launches for Dell ESXi Servers (in-band).
  • Dell Servers (Out-of –Band)
    • Dell Remote Access Controller (DRAC) Console launches for all Dell Windows, Linux, and ESXi Servers and iDRAC7 (bare metal) devices.
    • Dell OpenManage Server Administrator (OMSA) console launch from all Dell servers (Out-of-Band).

 -          Launch Dell tools for further troubleshooting, and to take the corrective action:

  •  Warranty Report:

                     The Warranty report page is used to retrieve warranty related information via the service tag                 associated with the system.  You can review the warranty details of the system and also renew                   the warranty.

  • OpenManage Essentials (OME) :

                    OME Console can be used to verify the device and component’s health. It also provides rich                 device inventory information. You can launch the OME console to further troubleshoot the device                 specific information.

MONITORING DELL DRAC’s AND CHASSIS IN HPOM CONSOLE:

Once the Dell Hardware Auto-grouping Policy is completed, it identifies discovered Dell DRAC’s and Chassis;

  • Dell DRAC devices create the Hierarchies based on device type (Drac5 or iDrac6) and the server type (Monolithic or Modular), and are get classified under the respective DRAC5, or iDRAC6 groups in both the Nodes and Services.
  • Dell Chassis create the hierarchies based on device type (CMC or DRAC/MC) and are get classified under respective Dell Chassis (CMC or DRAC/ MC) groups in both the Nodes and Services.

Hierarchical Representation of Dell DRAC’s and Chassis in Services: 

                                

 DEPLOYING THE DELL DRAC’s AND CHASSIS STATUS POLL POLICY:

The scheduled task policy “Dell DRAC’s and Chassis scheduled Status Poll” monitors the overall system health status of Dell DRAC’s (DRAC5 and, iDRAC6 devices both Monolithic and Modular) and Dell Chassis (Chassis Management Controller (CMC) and DRAC/MC) devices in the HPOM console. The default interval of this policy is every day 2:00 A.M. The interval can be changed to the custom value as required. Policies are deployed via the Management Server.

Once the policy is run, the health status of the device is queried through the SNMP protocol and a corresponding message with the associated severity will be shown in the HPOM console. The health status message is also associated with the device, classified under Dell groups in the both Node and Service Hierarchies.

The schedule task policy acknowledges the health message and posts the current health status message for the classified DRAC’s and Chassis in the Active Message Browser of the HPOM console .The latest health status of the DRAC’s and Chassis is always displayed in the HPOM console.

Message Association in Service Hierarchy:

Once the Dell DRAC’s and Chassis Scheduled Status Poll policy is run, the policy retrieves the overall health status of the classified Dell DRAC’s and Chassis. The retrieved health status is mapped with the appropriate health message (Normal, Warning or Critical).The health message is associated for the DRAC’s or Chassis child nodes (Child node: Global System Status) and are seen on the Active Message Browser of the HPOM Console. The message severity for the node (Normal, Warning, and Critical) is propagated to the parent Node in both Node and the Service Hierarchies.

 Service Hierarchy: Only the Global System Status message with its corresponding health severity will be associated and updated for the node. The health severity will propagate to the device parent group.

Node Hierarchy:  In Node hierarchy, SNMP Trap Messages and Health messages are associated with the Server node; the worst case message severity is propagated to its parent Node group.

 Health Status Message Association for Chassis in Service Hierarchy:

TROUBLESHOOTING STEPS:

When the device Global health status is “Warning” or “Critical”, follow these steps to troubleshoot the issues:

-     Review the outstanding messages in the Active Message browser of the device.  If any issues exist– resolve them as per instructions in the message browser.

 -    Launch 1:1 console  for further troubleshooting,

  • Dell DRAC console launches for all DRAC5 and, iDRAC6 devices, both Monolithic and Modular.
  • Dell CMC console launches for all Dell CMC and DRAC/MC device.

 -     Launch Dell tools for further troubleshooting, and to take the corrective action:

  • Warranty Report:
  • Open Manage Essentials (OME) :

APPENDIX:

Refer to the following links: