After power loss previously working MD3220i no longer comes up - PowerVault Storage Forum - Storage - Dell Community

After power loss previously working MD3220i no longer comes up

Storage

Storage
Information and ideas on Dell storage solutions, including DAS, NAS, SAN and backup.

After power loss previously working MD3220i no longer comes up

  • We were hit by a huge storm Tuesday morning which knocked out power and unfortunately our backup generator didn't come online, causing power loss on our entire rack.  Everything seems fine except for our PowerVault MD3220i which no longer comes online, lights up the drives, etc. when powered back on.  I have replaced both of the power supplies and still am unable to get it to work.  This unit has worked flawlessly since 2013.

    I took both controllers out, pulled the batteries out and put them back in.

    I am able to ping the management IP of both controllers, and they are showing activity, but am not able to connect to either of them in the storage manager application.

    I connected up the serial cable and and receiving the following during boot:

    -=<###>=-
    Instantiating /ram as rawFs,  device = 0x1
    Formatting /ram for DOSFS
    Instantiating /ram as rawFs, device = 0x1
    Formatting...Retrieved old volume params with %38 confidence:
    Volume Parameters: FAT type: FAT32, sectors per cluster 0
      0 FAT copies, 0 clusters, 0 sectors per FAT
      Sectors reserved 0, hidden 0, FAT sectors 0
      Root dir entries 0, sysId (null)  , serial number f10000
      Label:"           " ...
    Disk with 1024 sectors of 512 bytes will be formatted with:
    Volume Parameters: FAT type: FAT12, sectors per cluster 1
      2 FAT copies, 1010 clusters, 3 sectors per FAT
      Sectors reserved 1, hidden 0, FAT sectors 6
      Root dir entries 112, sysId VXDOS12 , serial number f10000
      Label:"           " ...

    RTC Error:  Real-time clock device is not working
    OK.

    Adding 14606 symbols for standalone.




    Reset, Power-Up Diagnostics - Loop 1 of 1
    3600 Processor DRAM
         01 Data lines                                                  Passed
         02 Address lines                                               Passed
    3300 NVSRAM
         01 Data lines                                                  Passed
    4410 Ethernet 82574 1
         01 Register read                                               Passed
         02 Register address lines                                      Passed
    6D40 Bobcat
         02 Flash Test                                                  Passed
    3700 PLB SRAM
         01 Data lines                                                  Passed
         02 Address lines                                               Passed
    7000 SE iSCSI BE2 1
         01 Register Read Test                                          Passed
         02 Register Address Lines Test                                 Passed
         03 Register Data Lines Test                                    Passed
    3900 Real-Time Clock
         01 RT Clock Tick                                               Passed
    Diagnostic Manager exited normally.

    Controller has been locked down due to Hardware errors:

    ================= EXCEPTION LOG =================
    Serial number:     29T005W
    Entry count:       8
    Wrap-arounds:      0
    First entry time:
    Current Controller date/time: MAR-09-2017 06:51:58 AM
    Current Local (User) date/time: MAR-09-2017 04:18:23 PM

    ---- Log Entry #0 (Core 0) DEC-11-2012 02:22:20 PM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #1 (Core 0) DEC-11-2012 02:46:05 PM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #2 (Core 0) DEC-11-2012 03:53:04 PM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #3 (Core 0) AUG-06-2013 09:01:28 PM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #4 (Core 0) NOV-15-2013 02:25:04 AM ----
    11/15/13-10:06:49 (tNtbErrPolling): PANIC: PLX NTB Port 4 reg 0x000044a4 changed, original val 0x00000000, current val 0x00000010

    Stack Trace for tNtbErrPolling:
    0x0025ffac vxTaskEntry  +0x5c : vkiTask (0x15000308)
    0x0016844c vkiTask      +0xec : ntbErrPolling ()
    0x00143c20 ntbErrPolling+0x2a0: ntbRegCompare (0x4, 0xac8e10)
    0x00143000 ntbRegCompare+0x100: _vkiCmnErr ()
    0x00163544 _vkiCmnErr   +0x104: 0x00163780 (0x585580, 0x4f8be0, 0xd838e0)
    0x00163b04 vkiLogShow   +0x544: psvJobAdd (0x1648a0, 0xd83a40, 0, 0)
    0x00148c04 psvJobAdd    +0x64 : msgQSend ()
    0x00402714 msgQSend     +0x61c: taskUnlock ()

    ---- Log Entry #5 (Core 0) NOV-15-2013 02:25:38 AM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #6 (Core 0) AUG-04-2014 09:28:29 PM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #7 (Core 0) DEC-15-2014 06:33:01 PM ----
    12/16/14-02:43:26 (tNtbErrPolling): PANIC: PLX NTB Port 0 reg 0x00000364 changed, original val 0x00000000, current val 0x00000020

    Stack Trace for tNtbErrPolling:
    0x0026070c vxTaskEntry  +0x5c : vkiTask (0x15000308)
    0x00168b4c vkiTask      +0xec : ntbErrPolling ()
    0x00144308 ntbErrPolling+0x288: ntbRegCompare (0, 0xebf6a0)
    0x00143700 ntbRegCompare+0x100: _vkiCmnErr ()
    0x00163c44 _vkiCmnErr   +0x104: 0x00163e80 (0x585f20, 0x4f9320, 0xd84290)
    0x00164204 vkiLogShow   +0x544: psvJobAdd (0x164fa0, 0xd843f8, 0, 0)
    0x00149304 psvJobAdd    +0x64 : msgQSend ()
    0x00402e54 msgQSend     +0x61c: taskUnlock ()

    ---- Log Entry #8 (Core 0) DEC-15-2014 06:33:39 PM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #9 (Core 0) DEC-15-2015 06:40:42 AM ----

    Root Complex TLP header[0] 30008000
    Root Complex TLP header[1] 01200033
    Root Complex TLP header[2] 00000000
    Root Complex TLP header[3] 00000000


    PCI SERR Exception
       PLX PCI-E Switch  (Unit 0)
            VID 0x10b5 DID 0x8632 B0:D0:F0
            PCI Status = 0x4010
            Bridge Secondary PCI Status = 0x4000
       PLX PCI-E Bridge to Host Card  (Unit 1)
            VID 0x10b5 DID 0x8632 B1:D4:F0
            PCI Status = 0x4010
            PCI-E Device Status = 0x0005
            PCI-E AER Uncorrectable Status = 0x00040000
              Header Log 0 = 0x00000044
              Header Log 1 = 0x00000044
              Header Log 2 = 0x00000044
              Header Log 3 = 0x20008080
            PCI-E AER Correctable Status = 0x00000040

    ---- Log Entry #10 (Core 0) DEC-15-2015 06:40:45 AM ----

    WARNING: Restart by watchdog time out

    ---- Log Entry #11 (Core 0) DEC-15-2015 06:41:19 AM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #12 (Core 0) AUG-06-2016 03:55:22 PM ----

    WARNING: Reset by alternate controller

    ---- Log Entry #13 (Core 0) MAR-06-2017 09:43:27 PM ----
    ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
    ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
    ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
    ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
    ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
    ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
    ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
    ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
    ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16

    ---- Log Entry #14 (Core 0) MAR-06-2017 09:43:27 PM ----
    ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18

    ---- Log Entry #15 (Core 0) MAR-06-2017 09:46:14 PM ----
    ERROR: Port 0 Bad TLP Count 1572864 exceeds threshold 16
    ERROR: Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
    ERROR: Port 4 Bad TLP Count 1572864 exceeds threshold 16
    ERROR: Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
    ERROR: Port 5 Bad TLP Count 1572864 exceeds threshold 16
    ERROR: Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
    ERROR: Port 6 Bad TLP Count 1572864 exceeds threshold 16
    ERROR: Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
    ERROR: Port 2/6 Rx Err Count 24 exceeds threshold 16

    ---- Log Entry #16 (Core 0) MAR-06-2017 09:46:14 PM ----
    ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18

    ---- Log Entry #17 (Core 0) MAR-06-2017 09:47:35 PM ----

    Faults are detected on all installed power supplies

    ---- Log Entry #18 (Core 0) MAR-06-2017 09:50:18 PM ----
    ERROR: Port 0 Bad TLP Count 526208 exceeds threshold 16
    ERROR: Port 4 Bad TLP Count 526208 exceeds threshold 16
    ERROR: Port 5 Bad TLP Count 526208 exceeds threshold 16
    ERROR: Port 6 Bad TLP Count 526208 exceeds threshold 16
    ERROR: Port 0/4 Rx Err Count 128 exceeds threshold 16

    ---- Log Entry #19 (Core 0) MAR-06-2017 09:50:18 PM ----
    ERROR: Type-I Port 0 ECC correctable error threshold exceeded reg 0x678 val 0x80

    03/09/17-16:18:23 (tSystem): ERROR: FPGA FW is out of date
    "Rhone03 rev17" currently in use
    "Rhone03 rev20" available for update
    Current date: 03/09/17  time: 16:18:23

    Send <BREAK> for Service Interface or baud rate change
    03/09/17-16:18:27 (tNetCfgInit): NOTE:  eth0: LinkUp event
    03/09/17-16:18:28 (tNetCfgInit): NOTE:  Acquiring network parameters for interface gei0 using DHCP
    03/09/17-16:18:37 (ipdhcpc): NOTE:  netCfgDhcpReplyCallback :: received OFFER on interface gei0, unit 0
    03/09/17-16:18:38 (ipdhcpc): NOTE:   DHCP server: 10.0.0.1
    03/09/17-16:18:38 (ipdhcpc): WARN:   **WARNING** The DHCP Server did not assign a permanent IP for gei0.
    03/09/17-16:18:38 (ipdhcpc): WARN:               Network access to this controller may eventually fail.
    03/09/17-16:18:38 (ipdhcpc): NOTE:   DNS domain name: XXXXXX.com
    03/09/17-16:18:38 (ipdhcpc): NOTE:   DHCP client name: md3220i-mgmt
    03/09/17-16:18:38 (ipdhcpc): NOTE:   Client DNS name servers: 10.0.0.1
    03/09/17-16:18:38 (ipdhcpc): NOTE:   Client IP routers:   10.0.0.1
    03/09/17-16:18:38 (ipdhcpc): NOTE:   Assigned IP address: 10.0.0.122
    03/09/17-16:18:38 (ipdhcpc): NOTE:   Assigned subnet mask: 255.255.255.0
    03/09/17-16:18:38 (tNetReset): NOTE:  Network Ready


    I replaced both of the power supplies, so not sure why the error is still there or how to clear it.  I got into the vxworks shell, but am not sure how I can clear this so it starts again.

    Anyone have an idea how to clear this?

  • Hello kgroup,

    The error that you posted is that the complete error & it starts over again when you are connected to the serial interface of controller 0? Also are any of the virtual disk online or are they also offline?

    Please let us know if you have any other questions.

    DELL-Sam L
    Dell | Social Outreach Services - Enterprise
    Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)

  • That error is from the output of the serial putty session.  If I powerdown and bring up controller 0 or 1 individually it continually outputs the same message about the bus errors and doesn't proceed any further.

    I am getting that with both controllers.  I was able to get the shell and all of the standard commands are not known (i.e. lemClearLockdown, etc.) all are unknown, almost like nothing loaded at all for the controllers as it has no knowledge about any of the commands. 

    Forgot to answer your last question.  None of the drive lights even come on.  The controller basically immediately enters a failure state when powered on.  No drives light up at all.

  • Hello kgroup,

    Since you have tried to boot with just a single controller and the system won’t boot then I would say that your controllers needs to be replaced. The controller that the output came from is showing signs that it is failed. If the output is the same for the alt controller then both need to be replaced.

    Please let us know if you have any other questions.

    DELL-Sam L
    Dell | Social Outreach Services - Enterprise
    Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)