Start a Conversation

Unsolved

R

1 Rookie

 • 

5 Posts

24

March 7th, 2024 03:20

Poweredge 720xd HD/Raid problems

Hello,

We recently had an unexpected power issue with our UPS which resulted in a loss of power to our servers. After power was restored, there were two older 720xd servers that would not load their virtual disks.

Server one: This server shows all disks missing when the the VD's are being loaded. The Perc configuration utility shows all the drives as failed. There are 9 3.5 drives in front (comprising one VD) and two 2.5 drives in the rear (comprising another VD). If I re-seat the drives, then the Perc config manager will show them as foreign. I can  import the foreign config and then both virtual disks are recognized with an optimal state. At that point I need to restart the server and after the reboot all disks are again listed as missing. I've upgraded the Perc, idrac and bios to the latest revisions but the problem remains.

Server two: This server (similarly configured) stalls for a long time when initializing the idrac. When it finally gets past that point, it states that the Lifecycle controller is disabled. When it tries to load the VD's it stops and says there are missing or offline drives. The perc utility shows one of the VD's is missing 4 drives. The missing drives do not show up at all in the PD management screen. I've drained the flea power from this server, but the idrac and associated issues remain.

 Being that this happened during a power event, there may have been some damage done to them. I'm guessing the system boards probably need to be replaced in both? For server one, I doubt it's a backplane or cable issue since it affects all drives and both the front and rear backplane. For server two, I'm guessing a system board replacement will be necessary for the idrac issues alone. Does that sound reasonable? Are there any other areas that I should be looking at?

 I have recovered the data from both machines so it's not a critical situation. I'd like to try to resurrect them just to see if I can get access to the data and then repurpose them for something else.

Thanks.

Moderator

 • 

3.2K Posts

March 7th, 2024 14:05

Hi,

while a power loss can definitely cause damage, replacing the system boards might not be the first course of action for both servers. Here are some suggestions before resorting to hardware replacements:

Server One:

  • Battery Drain: The reseating and recognizing the drives as foreign suggests there might be issues with the configuration being lost after a reboot. Try performing a thorough power drain by:

    • Shutting down the server and removing all power cables.
    • Holding the power button for 30 seconds to discharge any residual power.
    • Leaving the server unplugged for at least 30 minutes to fully drain the flea power.
    • Reconnect the power cables and restart the server to see if the configuration persists.
  • Check BIOS RAID settings: Ensure the BIOS RAID settings are not set to "off" or "AHCI" which might interfere with the RAID controller.

  • Update Firmware: Although you mentioned updating the firmware, double-check the Dell website for the latest versions of BIOS, iDRAC, and specifically the PERC H710 firmware (assuming it's the RAID controller used). Sometimes, specific firmware updates address compatibility or bug fixes related to power loss scenarios.

Server Two:

  • Reset the iDRAC: Try resetting the iDRAC to its factory defaults. You can usually do this through the iDRAC interface or by using the dedicated physical button on the server (if available). Refer to the Dell manuals for specific instructions for your model.

  • Check PERC Logs: The PERC utility might have logs containing additional details about the missing drives. These logs can help diagnose the issue further.

  • Inspect Cables and Backplane: While you mentioned it's unlikely, visually inspecting the cables and backplane for any physical damage wouldn't hurt. Look for any loose connections or signs of burning.

Alternative Solutions:

  • Data migration: If your primary concern is data access and repurposing the servers, consider migrating the recovered data to a different server and utilizing the existing servers for non-critical storage purposes instead of aiming for full functionality restoration.

If this not helps probably hardware needs to be replaced.

 

 

3 Apprentice

 • 

406 Posts

March 12th, 2024 04:06

 it seems this is a raid corruption or a Puncture raid so you need to take the data backup and reconfigure the raid for saftey point of view, i have faced the same and came out with this solution only.

No Events found!

Top