PE 1900 w/PERC 5i SAS RAID Will not boot.

Servers

Servers
Information and ideas on Dell PowerEdge rack, tower and blade server solutions.

PE 1900 w/PERC 5i SAS RAID Will not boot.

This question is answered

I'm working on a client's Power Edge 1900 that suddenly and without warning (for lack of a better phrase) stopped serving about 3 days ago.  The client reported LAN and WAN problems Friday but did not request an onsite visit until Saturday.  The client reported that she had someone in her office power cycle the server Friday, but that person did not select the correct server from the KVM switcher so no knowledge as to error messages or any information.  When I arrived Saturday (yesterday), I found the server to be electrically on, but stalled still displaying the BIOS messages.

I tried several troubleshooting steps to isolate the problem to the PERC 5/i SAS controller.  As you see in the photo, the RAID 5 logical drive is still found on the controller and when I checked the three drive RAID configuration it reported the RAID in Optimal condition.  No errors.  If I unplug the RAID controller from the system board, the machine will boot past the BIOS messages, but stop looking for a hard drive.  If I plug in a USB bootable "thumb drive" with Ubuntu Linux, I can boot to it with or without the PERC controller plugged in, but if the PERC controller is plugged in, then I have to first press F11 and specify booting from the USB drive.  The boot up menu does offer the PERC as a device to boot from, but if selected will not boot, just stall.

I know that the PERC card has a DIMM and an external battery.  I've been reading other posts and it appears that both the DIMM and external battery would not cause the situation I have described.  I figure if the controller was bad, that it would not report the logical drive and would not be managable, but its the only thought I have.

I tried to upload a photo of the BIOS messages, but the site would not accept it.

Thanks for your time,

Tom

Verified Answer
  • Just to close out this post with my final results.

    As it turns out the PE 1900 that I was working on had a double failure.  While I was still attempting to recover data (booted via USB with Ubuntu Linux) from the RAID 5 partitions, I attempted to wake up the video from being asleep (after running for 3 days), the monitor turned on but nothing would display.  Then I ran the machine (Linux) over night to run a Deep Scan and the next morning I re-set the lost partitions’ partitions type which requires a reboot.  This time upon reboot, a hardware error appeared.  The front display began flashing orange and reporting:

    W1228 ROMB Batt < 24hr

    Which according to the owner’s manual means:

    Warns predictively that the RAID battery has less than 24 hours of charge left. Replace RAID battery. See "Replacing the SAS RAID Controller Daughter Card Battery"

    So, this could explain the crash and resulting corruption of the drive partitions.  At this point the machine would not let me boot to my USB thumb drive, but rather runs Dell Diagnostic which is confirming the displayed error with : 2900:0221, 2900: 0325

    I ordered and replaced the battery.  Attempted again to recover partition data, but eventually gave up, so I started a system rebuild when the second failure occurred.

    The second failure was one of the three hard drives that made up the RAID 5.  I isolated and confirmed which drive was failing.  While the drive itself may still be good, the SMART circuitry was not reporting per Western Digital diagnotics.  I purchased and installed a replacement drive, rebuilt the OS and have had the machine back in service for the last 2 weeks.

    Thank you to all who tried to help and hopefully this post may help someone else who experiences the same or similar combinations of strange events.

All Replies
  • You might also check out this article ... it is also possible that some incompatible software was installed, including virus infections:

    support.microsoft.com/.../156669

    Unfortunately, I do not see any "simple" fixes here.  In your situation, you could probably jump to the "Last Known Good Configuration" section.

  • UIbuntu Linux's Disk partition utility is reporting the C drive as bootable and intact.  When I run its chkdsk, it comes back error free.  When I look at the D drive, its messed up.  I'm seeing two layers for the D drive.  One layer reports strange partition type but is known as partition 3.  The other layer (for the same 477GB space) reports as NTFS, but as Partition 5.  I'm running a utility known as TESTDISK to see if I can recover the partition or any data from the goofed partition.

  • I would also remove any external devices not needed for troubleshooting. Plug a key board and monitor directly to the server and remove any kvm, external hard drives, etc....

  • Dear in BIOS change the boot sequence ,mean change it from HD first ,then shutdown it from button. again restart it and change the bios to orignal setting. Hope it will proceed .

  • Just to close out this post with my final results.

    As it turns out the PE 1900 that I was working on had a double failure.  While I was still attempting to recover data (booted via USB with Ubuntu Linux) from the RAID 5 partitions, I attempted to wake up the video from being asleep (after running for 3 days), the monitor turned on but nothing would display.  Then I ran the machine (Linux) over night to run a Deep Scan and the next morning I re-set the lost partitions’ partitions type which requires a reboot.  This time upon reboot, a hardware error appeared.  The front display began flashing orange and reporting:

    W1228 ROMB Batt < 24hr

    Which according to the owner’s manual means:

    Warns predictively that the RAID battery has less than 24 hours of charge left. Replace RAID battery. See "Replacing the SAS RAID Controller Daughter Card Battery"

    So, this could explain the crash and resulting corruption of the drive partitions.  At this point the machine would not let me boot to my USB thumb drive, but rather runs Dell Diagnostic which is confirming the displayed error with : 2900:0221, 2900: 0325

    I ordered and replaced the battery.  Attempted again to recover partition data, but eventually gave up, so I started a system rebuild when the second failure occurred.

    The second failure was one of the three hard drives that made up the RAID 5.  I isolated and confirmed which drive was failing.  While the drive itself may still be good, the SMART circuitry was not reporting per Western Digital diagnotics.  I purchased and installed a replacement drive, rebuilt the OS and have had the machine back in service for the last 2 weeks.

    Thank you to all who tried to help and hopefully this post may help someone else who experiences the same or similar combinations of strange events.