CPU Machine Chk: processor sensor, transition to non-recoverable

Servers

Servers
Information and ideas on Dell PowerEdge rack, tower and blade server solutions.

CPU Machine Chk: processor sensor, transition to non-recoverable

  • New PowerEdge 2950 server, freshly installed with all the latest firmware and drivers.
    Rebooted today with following error in ESM log:
     
    "CPU Machine Chk: processor sensor, transition to non-recoverable"
     
    Why did this happen, how can I resolve the issue?
    I have DSET report available if somene cares to take a look.
     
    Thank you,
    Drazen
  • I have the same problem can somebody help us.
    thanks.
    Majagual
  • So do I,  in my office, Poweredge 2950 , running Windows Server 2003 Standard Edition SP1 , BOD and rebooted at least 3 times a day with the same error message "E1422  CPU machine Chk ". While the load is average 25% , because it only serve as File Server + exchange server+ DNS + Wins + DHCP + Domain Controller + Application Server.

     

    The funny thing is our previous machine with the exactly same settings, purposes and load and is only a "home made" server using old Pentium 4 and common desktop mainboard, it never hung,  at least for a week....

  • Has anybody found out something? I have the same issue with a PE2950 - 2 CPU Dualcore

    I've just open a Support Case, and they asked me to do some HW tests, basically boot the machine with just one CPU, and then the other one. In order to know which CPU is working wrong.

    I'll do it this afternoon. If I have some news I will let you know.

  • Anyone have any luck fixing this?  Similar issue, same error message on the front lcd thing.  Machine is currently running just fine, but with this message I'm kind of concerned.  I talked to a Dell rep and they just suggested upgrading the bios, perc driver, perc firmware - personally i don't know what this would accomplish with it being a CPU error...  The bios maybe, but the raid controller???

    Anyway, if anyone was able to correct this could you please let me know what the issue was.  Many thanks!

     

    -Jon

  • Jon,

    My server just started this as well and did you find a solution?

     

    Chris

  • Chris,

    I'm not 100% sure what ended up fixing it, but what was suggested by Dell and what I did was update the Bios, BMC, Perc5 driver and Perc5 firmware.  After these updates I rebooted then hit CTRL+E and cleared the log in there - this is where the error was actually logged, so it's possible that all I had to do was clear this log and may have been fine, but figured since they suggested doing the updates that I might as well just do them.

     

    Here are the links I was given to the updates:

    BIOS:

    http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R202414&SystemID=PWE_2950&servicetag=FCYPLC1&os=WNET&osl=en&deviceid=11598&devlib=0&typecnt=0&vercnt=11&catid=-1&impid=-1&formatcnt=6&libid=1&fileid=281429

    BMC:

    http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R202152&SystemID=PWE_2950&servicetag=FCYPLC1&os=WNET&osl=en&deviceid=5814&devlib=0&typecnt=0&vercnt=10&catid=-1&impid=-1&formatcnt=4&libid=29&fileid=280941

    PERC5 driver:

    http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R194151&SystemID=PWE_2950&servicetag=FCYPLC1&os=WNET&osl=en&deviceid=9182&devlib=0&typecnt=0&vercnt=6&catid=-1&impid=-1&formatcnt=3&libid=46&fileid=268332

    PERC5 Firmware:

    http://support.dell.com/support/downloads/download.aspx?c=us&l=en&s=gen&releaseid=R189337&SystemID=PWE_2950&servicetag=FCYPLC1&os=WNET&osl=en&deviceid=9182&devlib=0&typecnt=0&vercnt=5&catid=-1&impid=-1&formatcnt=6&libid=46&fileid=259796

     

  • FYI, on a PE2950III it still took 9 minutes to flash the BMC. I thought it crashed and restarted, that's bad news. But if you make this error, just restart and retry it, give it all the time it needs and it should work. Note that during update, it will thrash, fans will phase up and down madly, and the LCD will go nuts. All SOP.

     

    All the same other CPU errors as the rest of you guys too, even with the newest firmwares.

    - Joe

  • Same issue here, after contacting Dell support, they told me to run the dset application and send them the report.  They called me back saying the system was fine, that it had no erros, and it could be a bug from a sensor so i was told to update the BIOS and BMC, and to run the dset application again with option 3 (clear esm logs)

    I did that and the system is running again, and the error code is gone!!!   I think that simply clearing the logs it will remove the error, but it may eventually come up once again, so its best to do the firmware upgrades.  This error is very generic, and in order to know what the problem is you need to run a diagnostics, it can be a problem with any of the hardware on the system or it may not be a problem at all, sometimes just a bug.

  • We, too, have had the E1422 CPU Machine Chk error show up on one of our Dell 2950's 3 times.  We've called Dell and had them come to replace parts two times.  The error still comes up.   This is on a SLES 11 box.  We've confirmed that the BIOS and BMC are up-to-date.  Still happening.

    RLR:-)

  • We are having the same problem on our PE 1950. Everything is latest. OMSA does not see anything wrong.

  • We are running on OMSA 5.4. and /opt/dell/clearesm.sh fixed the problem.

  • I have the same issue, but I can't run dset to clear the logs, because the CPU machine check error is keeping me from booting into Windows.

  • I have had the same error on a pe2950a couple of months ago and more recently (Christmas Day!) on a pe1950. Same advice from Dell as always, upgrade the drivers/firmware and BIOS, DSET shows nothing... After the initial problem, there have been no further occurances since applying updates. Still don't know why though...
  • I have same problem but only start server in grafical mode (init 5). O.S. tested is RedHat 5.4 (certified with pe2950) and CentOs 5.5.

    When start server in text mode (init 3) ALL OK, but switch in init 5 (startx) still error (E171F pci express fatal error, E1422 cpu machine ck failure).

    Few ideas or solution?

    Peter

    p.s. sorry for my english.