Due to some electrical issues lately, we've had some periods of fairly dirty power going to some C6105 servers which get unhappy as a result. It looks very much like the server nodes reboot but when they come back they can't see the disks at the front of the enclosure and so just hang. When it happens, the symptoms are the same for all four nodes in each affected enclosure. Issuing an ipmitool chassis power reset command to each of the nodes is not enough to bring them properly back to life and it is necessary to physically remove power to both PSUs in the enclosure to clear this error.
Is there some ipmitool raw command that I can issue to cause the C6105 enclosure (or whatever the shared bits are called) to perform a power reset?
From this link :http://ipmitool.sourceforge.net/manpage.html
ipmitool chassis help Chassis Commands: status, power, identify, policy, restart_cause, poh, bootdev
ipmitool chassis power help chassis power Commands: status, on, off, cycle, reset, diag, soft
Instructs the BMC to perform a warm or cold reset.
Geoff PDell | Social Outreach Services - EnterpriseGet Support on Twitter @DellCaresProDownload the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device!(iOS, Android, Windows)
on C6105 the ipmitool works only for the servers and not the whole enclosure.
Have you tried to unplug one of the servers and plug it again after a few seconds? If that helps it means that the problem is not on the enclosure, but on the server. In that case, a normal "power off" and "power on" for each server, using ipmitool, will do the job. In addition I would suggest to "power off" all four servers together. Hopefully, this will cut the power to the disk controller in front (if the problem resides there) and will "clear" the fault.
Finally, if still the only solution is to remove the cables, then you should consider using remote manageable PDU,
where you can turn off/on the specific outlets. It means additional HW but I'm affraid this is the only solution.
So `ipmitool ... mc reset cold` is has effects that more closely imitate removing the power than `ipmitool ... chassis power reset`?
I'll give it a go next time anyway and let you know.
The cabling is a bit fiddly, so pulling one of the nodes out is something the boss isn't keen on. You're right, though - it would be the only way to remove power from a single node, so it might have to be done. I should also try and find out if the disk controllers are powered through their respective nodes or by the enclosure.
Individual 'power off' calls definitely don't do the job. I'll try the four together, and in combination with Geoff P's 'mc cold' suggestion.
Three nights in a row now, so the initial 'No' on the managed PDU strips is starting to waver. There's even mutterings of maybe some clean power from somewhere.
The electricital problem didn't recur last night, so I'm going to have to wait till next time it happens before I can test.