RAID 5 worries

Storage

Storage
Information and ideas on Dell storage solutions, including DAS, NAS, SAN and backup.

RAID 5 worries

  • Hi,
     
    I had a problem with a raid-5 array in a pv220s. There are 13 drives in the array.
     
    Due to a power outage, their was an NVRAM mismatch. Having restored the NVRAM settings and brought the drives online (many would not come online automatically)
     
    The raid-5 volume is now in 'background initialisation' with the system back online.
     
    I'm concerned that this background initialisation process is cleaning the RAID volume (which I don't want).
     
    Is this normal?
     
    Paul
     
  • This is normal, as the configuration was lost and you restored it. The raid contoller is doing a check of the drives.

    It sounds like the contoller is doing a verification of the drives. which involves a complete low level test of all the sectors. if any are bad then these are mapped across all the disks and the data and parity information is re-striped. It can take ages, especially if the array is being accessed at the same time.

    The quick answer is - if you can access the data then all should be well. If you can't see anything then the config is gone.

     

     

     

  • Hi Tommo,
     
    Thanks for your reply. Under windows 2003 the drive is listed as 'Failed' & 'Missing'
     
    I have come across information relating to stopping 'Background Initialisation' and 'Reconstructing a Logical Drive' instead.
     
    Typically, this logical drive contains a lot of information that is not backed up and I really need to preserve/recover this information.
     
    Any advice on my possible options would be greatly appreciated.
     
    Regards,
     
    Paul 
  • Go along with Tommo666..
     
    I had the background initialization kick in many times, very normal,  never had a problem, though it is slow until it finishes.
     
    'Reconstructing a Logical Drive' instead."
    A couple of times I have been in a position where I have had problems with raid 5, in which I recreated the EXACT configuration, and the array resurrected after the background initialization. In the situations I did NOT initialize the drives, as once you "initialize", your  array and it's data are history . This is were having the array configuration backed up to floppy or complete documentation is a great help.
     
    If the array is beyond hope, professional recovery services are an option.
    Another option..there are a couple of software programs out there which will do a sector by sector copy of individual drives in an array, off a standard scsi adapter. Once the drives are "copied", the programs have modules which can piece the data together..not for the faint hearted.

    Message Edited by pcmeiners on 08-01-2005 10:01 AM

    Message Edited by pcmeiners on 08-01-2005 10:07 AM

  • Thanks for the info.

    My RAID array is currently in background initialization, and it looks like it won't finish for another day or so. I'm hoping that it will resurrect once this is complete.

     

  • No matter what happens just remember, there is hope until you manually initialize an array.
     
  • A power outage can wreak havoc with a raid controller/disks. A problem is corrupt parity info or corrupt config or dirty cache.

    another way would be power everything down, remove the raid card from the server (observing ESD),pop the cache/battery from the card, leave it for a minute or so. Then put it all back. Power up the array, give it 5 mins to totally boot and settle down then boot the server. As the card has been cleared it should pick up the config from the disks

                                                                           OR

    Before you take the drastic step of manually initialising the array (all data will be gone). I would, wait for the background stuff to finish, check the state of the array. If you can access your data, BACK IT UP.

    Reboot the server and ctrl-m or a into the raid bios. If you have an ami/lsi based card, select the correct adapter and look in the menu for physical objects, select this. A scan of devices will happen and a screen opens with the attached devices in slot/location order. Here you will see failed drives, they may show as failed or ready. A good disk shows as online. If you know which disks makes up your raid configs you can up the up/down arrows to move about. the space bar selects or de-selects the disk. So select a failed/ready disk then press F2, a sub menu will appear, here you can force the drive on-line. A warning message will appear. If your are sure go ahead. Once you have done all you need, run the consistancy check on the raid logical disks. This will take ages so do it over a weekend because the server will be stuck in the raid bios until complete. Consistancy check should repair any error in the raid disk.

    If it works, you have repaired your raid, if not then you have made a backup and can re-create your raid structure and restore your data.

  • Here is the update:

    The background initialisation process has completed. And the volume has a status of 'Ready'

    In windows disk management there are two entries for this drive.

    1. A new disk with status 'Not Initialized' & 'Unallocated'

    2. A 'missing' volume representing the old volume

    There is another issue that I think I've spotted. The old configuration was an array of 13 disks in RAID-5 with the 14th Disk assigned as a global hot-spare

    If I look in array manager, there is no disk assigned as a global hot spare, and more importantly, perhaps, rather than drives 0-9 & 10-14 being the array drives with drive 15 being the GHS, In array manager Drives 1-9 and 10-15 are listed as the array drives.

    Question 1: Can I attempt to reconfigure the array so that the configuration matches the old one (it looks like this is all wrong)

    Question 2: If the answer to Question 1 is Yes, can I cancel background initialization (if it starts) considering that it completed safely already so that I don't have to wait 36 hours to see if the volume has been resurrected.

    My only other alternative I think is to ship this array at some expense to Data Recovery

    Thanks for your help so far.

    Regards,

    Paul

     

     

     

     

  •  I look in array manager, there is no disk assigned as a global hot spare, and more importantly, perhaps, rather than drives 0-9 & 10-14 being the array drives with drive 15 being the GHS, In array manager Drives 1-9 and 10-15 are listed as the array drives.

    Question 1: Can I attempt to reconfigure the array so that the configuration matches the old one (it looks like this is all wrong)

    I have resurrected a few raids by duplicating the original setup exaxtly. After deleting the present setup from the raid bios console, imediately recreate the orgiinal without leaving the console. When asked to save the config choose YES, When asked to initialize  choose NO !!!!!!!!, then reboot, and the background Initilization will start. If your lucky, you get it back. Not sure if this will work since the previous background Init took place. All the setting of the original would have to duplicated exactly. I would get more opinions on this

    Question 2: If the answer to Question 1 is Yes, can I cancel background initialization (if it starts) considering that it completed safely already so that I don't have to wait 36 hours to see if the volume has been resurrected.

    No, the background initi will occur, and it will start over and over again if interrupted.

     

  • Hi,

    Thanks for all your help and advice to date.

    I rebooted the server, and entered raid bios. I reconfigured the array as I remembered it. ie. drives 0-9 + 10-14 with disk 15 as a global hot spare (i.e. the original configuration)

    I then created the logical drive, saved the configuration, rebooted.

    The drive is now back online and the data is viewable. It didn't enter 'Background initialisation' this time around.

    I'm currently running a consistency check on the array, and pending a successful result of that I may actually get some sleep again.

    For others who may read this post in the future, I was careful (with advice given herein) not to perform any operations on the array that would result in physical data changes.

    Regards,

    Paul

     

     

  •  

    Message Edited by pcmeiners on 08-04-2005 03:11 PM

  •  

     

    Message Edited by pcmeiners on 08-04-2005 03:09 PM

    Message Edited by pcmeiners on 08-04-2005 03:09 PM

  • Nice going !!!!!!!!!!!!!!!
     
    as I read you last post and you said  "as I remembered it", I had a sinking feeling, until the next few lines.:smileyhappy:
     
    Since you booted into Windows you should be OK,  get some rest.
    The last raid disater I had (year ago) was due to a firmware bug on  Seagate drives. I lost the array which was an FSMO and had numnerous SQL databases.. nothing beats a 36 hour day., about the 30th hour I thought about finding a semi truck to run me over on the highway, figured it would be less painful.
     
     
    Paul Meiners