Troubleshooting is key as a first step to resolving any Server issue. The steps below are generic guidelines which can be applied to any issue and will help you finding the root-cause.

Hardware specific guides are available in this wiki using the menu on the right hand side to navigate.

You can also use the Dell Server Troubleshooting guide.

Table of Content:

  1. Identify the Symptoms
  2. Gather system logs
  3. Run embedded diagnostic tools
  4. Analyse symptoms, logs and tools results
  5. Apply possible fixes
  6. Re-test to confirm resolution

Step 1: Identify the Symptoms

  • Error messages on the machine (LED quad pack, LCD panel)
  • Lights on a component
  • Change in behavior (rebooted, unstable)
  • Since when / what changed (move, update, new program)

Step 2: Gather system logs

There are several tools available to gather the system logs on your server. The table below shows which tool is best suited to each component:

 

Component

Tool

 Hard Disk 

Memory

CPU

OS

 Blade Chassis 

DSET
Dell Server eSupport Tool

 Yes

 Yes

 Yes   

 Yes  

No

OMSA
OpenManage Server Administrator

 Yes

 Yes

 Yes   

 No

No

iDRAC
Integrated Dell Remote Access Controller

 Yes

 Yes

 Yes  

 No

No

CMC Logs
Chassis Mamagement Controller

 No

 No

 No

 No

Yes

PERC logs export using software tools

 Yes

 No

 No

 No

No

Step 3: Run diagnostic tools or diagnostic steps

Dell Hardware Diagnostics are embedded tools installed in the pre-OS of your server. They have a physical view  of the attached hardware and can identify hardware problems that the operating system and other online tools can't.

Instructions on how to access and use the tools are available in the following Dell article SLN283546: How to Run Hardware Diagnostics on your PowerEdge Server.

Step 4: Analyse symptoms, logs and tools results

This is where all the previous steps come together to give an overall picture of the issue and should enable you to decide if a component is at fault or if it is linked to a change, a software issue or other.

Step 5: Apply possible fixes

For all types of problem, applying the latest updates and fixes are always recommended to fix or prevent issues. More details are available in the server update howto page.

Depending on the results of your analysis, different fixes could be available to resolve the issue (swapping components, updating firmware, restart/reboot). It's important to apply one fix at a time and test for resolution each time. Otherwise the troubleshooting steps will be mixed up.

Step 6: Re-test to confirm resolution

Confirming resolution is an important step which is often forgotten or ignored but will provide valuable feedback if the issue reoccurs.

To confirm resolution:

  1. Ensure ALL the symptoms are gone in the same time frame as they occurred previously.
  2. Clear all the system logs.
  3. Re-run the diagnostic tools.
  4. Gather a new set of system logs. They will serve as a baseline for the machine.

If all goes well during those steps, we can consider that the issue is resolved.

For more technical content, you can refer to our online Enterprise Knowledge base.