RAID 1 array degraded; no option to rebuild

Servers

Servers
Information and ideas on Dell PowerEdge rack, tower and blade server solutions.

RAID 1 array degraded; no option to rebuild

This question is answered

I have a PowerEdge T110 server with a PERC S100 RAID controller (firmware/"fake" RAID).  Unfortunately, I had a small accident tonight while I was working on that machine.  While attempting to clean up a cable mess, I accidentally yanked on the wrong cable and unplugged the power for the server.

While the server was running through POST, it paused and stated that the RAID 1 array was degraded.  By entering the S100 configuration, it tells me that drive 0 is degraded, and the other one is fine.  The server boots just fine.

I opened the OpenManage utility which displays the status of the virtual disk to be degraded as well.  However, despite displaying the status as being degraded, absolutely nowhere does it list any option to rebuild the array.  I can't even check for the consistency.  Under the drop-down list for "tasks," the only available options are:

  1. Delete
  2. Assign/Unassign Dedicated Hot Spare
  3. Rename

That's all.  Obviously, I don't want to do any of those.  The configuration utility in the BIOS also seems useless.  Although it lists the specific physical disk that is degraded, I cannot see any option for rebuilding the array consistency.  Since the physical disk has not failed and is not actually damaged, I don't need or want to replace it.

So, why isn't any rebuild option listed, and how can I rebuild this array without deleting the entire volume and restoring from a backup?

Verified Answer
  • I tried swapping the cables, and while it was smart enough to still realize the disk was just moved, it did not affect the degraded disk.

    I decided to go for broke and just take the degraded disk out and "clean" it using DiskPart.  That worked fine, and Windows 7 displayed the disk as "not initialized."  I put it back into the server, and it was registered as being NonRaid...didn't even appear as "new."  Basically, it was appearing as its own volume automatically.

    I continued booting into Windows Server to verify that I haven't screwed anything else up to put my nerves at ease.  Windows booted, so I went into the virtual disks section of OpenManage and saw that the "new" (rather, old) disk was displayed there.  So, I selected "delete" from the list of available tasks.  It deleted successfully, disappearing from the list of virtual disks.

    At this point, I figured I could go right ahead and assign a dedicated hot spare to the degraded array from the available tasks.  Nope. No disks available. What?

    Freaking out now, I went under the "physical disks" section of OpenManage, and there it was with its state as "ready" (opposed to "online").  Under its available tasks, it had "assign global hot spare."  I executed that, and the status of the RAID array under "virtual disks" changed to "rebuilding."

    [ADMIN NOTE:  Profanity removed per TOU], why was this so difficult?

    During the time I've spent writing this, its rebuilding progress has reached 4%.  I'll update on the status when it completes.

    EDIT: Everything worked successfully.  The rebuilding completed after a few hours and the RAID volume is reading as healthy again.

All Replies
  • The only way to get the drive back into the array is select assign hot spare, if this screen lists the disk you disconnected then set it as hot spare and reboot. It will start to rebuild from the other disk.

  • Thanks for your suggestion, and sorry for the late response.  Unfortunately, when I go to assign a hot spare to the array, OpenManage says, "There are currently no disks that are available, large enough, or of the correct type to be used as a hot spare for this virtual disk."

    The degraded drive is still in the system, and it's visible under the "physical disks" menu.  Is there any other way to restore redundancy?  I'm shocked at how complicated this is for a server.  Even Intel's fake RAID for desktops is easier and can rebuild without ever having to restart.

  • Sounds like you will have to offline the disk, then online it then set it as hot spare. In open manage, go to physical view and select disk 0. In 'available tasks' select offline and apply. Wait for this to complete. In the same menu select online. The disk state sholud change to ready. Now you should be able to select it as a hot spare.

    The normal caveats apply, make sure you have a good backup before trying this.

  • Under tasks for the physical disk, there are actually no options available at all.  Even my third internal disk (which is just a backup storage drive, not part of any RAID volume) has no tasks available.

  • OK, time to down and dirty :-). Make sure you have a backup...

    Power down and disconnect drive 0 and power on. Press F2 to enter setup, this is just to stop the server booting into the OS. The raid contoller will register the missing disk. You can press Ctrl-R and confirm this in the raid bios. Once past the raid controller and into the system bios, power down and re-connect disk 0. Power on and either enter Ctrl-r or boot to the o and use open manage. Disk 0 should be listed as online and ready to set as a hot spare.

  • I shut down the system and disconnected the SATA cable from the degraded disk (disk 0).

    Verified the disk was missing from the RAID BIOS.

    Shut down again and reconnected the SATA cable to disk 0.

    In the RAID BIOS, I verified again that it was there, but the status was the same as before it was removed (red text, degraded, online).  The option in the lower left of the screen for setting a hot spare was grayed out / unselectable.

    After that, I continued to boot into Windows.  In OpenManage, the physical disk was reading as online with no available tasks (same as before).

    In the virtual disks section, when I went to assign a hot spare to the volume as you instructed me to do a few posts up, I got the same error stating that no disks are available for use as a hot spare. :(

  • Next step is to swap the cables on the disks, make disk 1 disk 0 and disk 0 disk 1. Then power on. The controller supports drive roaming so will pick up the new disk 0 as a boot device and should make disk 1 available to set as the spare. I have done this before on other controllers when they've been stubborn.

    You could also remove the disk and put it in a desktop and use diskpart to clean it, removing all the config info. Then put he disk back in the server.

  • I tried swapping the cables, and while it was smart enough to still realize the disk was just moved, it did not affect the degraded disk.

    I decided to go for broke and just take the degraded disk out and "clean" it using DiskPart.  That worked fine, and Windows 7 displayed the disk as "not initialized."  I put it back into the server, and it was registered as being NonRaid...didn't even appear as "new."  Basically, it was appearing as its own volume automatically.

    I continued booting into Windows Server to verify that I haven't screwed anything else up to put my nerves at ease.  Windows booted, so I went into the virtual disks section of OpenManage and saw that the "new" (rather, old) disk was displayed there.  So, I selected "delete" from the list of available tasks.  It deleted successfully, disappearing from the list of virtual disks.

    At this point, I figured I could go right ahead and assign a dedicated hot spare to the degraded array from the available tasks.  Nope. No disks available. What?

    Freaking out now, I went under the "physical disks" section of OpenManage, and there it was with its state as "ready" (opposed to "online").  Under its available tasks, it had "assign global hot spare."  I executed that, and the status of the RAID array under "virtual disks" changed to "rebuilding."

    [ADMIN NOTE:  Profanity removed per TOU], why was this so difficult?

    During the time I've spent writing this, its rebuilding progress has reached 4%.  I'll update on the status when it completes.

    EDIT: Everything worked successfully.  The rebuilding completed after a few hours and the RAID volume is reading as healthy again.

  • The S100 is basically a software raid solution. The firmware is on the chipset and the raid stack is in the software. Like the earlier SAS cards. These are the less expensive options for raid and while they offer the functionality they don't have the dedicated hardware. You sometimes have to coax them to see replacement disks.

    Swapping cables with your disks probably didn't work because of the drive roaming and the controller just picked up the id info from the disks. Using diskpart and cleaning the info removes any raid info, so the controller treats it as a disk allowing you to set as the spare.

    I have had to do this many times on customer sites and it's still a bit nervy doing it. Glad it worked in the end.

  • any option that I try to run displays the error message

  • What are you trying to do and what error message are you getting?

  • [ADMIN NOTE:  Profanity removed per TOU]  This is so much more complex than it should be.  I have just gone through this fiasco myself and  would have had a coronary with this process should I have not had solid backups in place.  Someone really needs to get their kicked for designing such a poor excuse for RAID volume management.  This is completely worthless!  There is my rant.  If you own an S100, we have MANY, you had better be prepared to monkey around with sensitive drives and connect them to other machines if you want to rebuild an array.  This should have never made it into a PowerEdge Server.  NEVER.

    Also, Make sure that you delete the 'NonRAID' Array (Virtual Disk 2 in my case) and not the RAID Array(Containing your OS and data)


  • Thanks for posting this. I ran into this last night after upgrading a T310 to Server 2012 R2 Essentials. I was not able to assign a hot spare and rebuild the array using the bios ctrl r and diskpart clean. I had to install Dell OpenManage Server Administrator Managed Node(windows - 64 bit) and after hooking drive externally to another computer and running diskpart list disk, select disk 2(in my case***), clean, I was also able to assign hot spare by clicking the name of the virtual drive "none" in my case and then selecting the disk action hot spare. Also, openmanage after installing it opens a web page with a login up top, I had to click cancel on that one and then on the black login screen I used my server creds to login.