Posted on behalf of Jordan Hargrave, Linux Software Engineer
(A follow-up on these bugs is provided here: http://en.community.dell.com/techcenter/b/techcenter/archive/2012/09/14/follow-up-ubuntu-on-dell-12g-poweredge-servers.aspx)
Although Dell does not officially support Ubuntu Linux on Dell PowerEdge servers, Ubuntu 12.04 LTS or later kernels will work on PowerEdge systems with a few caveats.
The sb_edac (and i7core_edac) drivers must be blacklisted or patched with an upstream version. Depending on which kernel you have, the sb_edac driver has a bug that may cause a temporary system hang when processing machine check events. This is not a Dell-specific issue. The bug will cause NMI watchdog timeout events to occur within the kernel message log:
[ 94.816130] [<ffffffff81640ac9>] do_nmi+0xf9/0x360
[ 94.816134] [<ffffffff81640130>] nmi+0x20/0x30
[ 94.816150] [<ffffffff8164356d>] notifier_call_chain+0x4d/0x70
[ 94.816153] [<ffffffff816435ca>] atomic_notifier_call_chain+0x1a/0x20
[ 94.816157] [<ffffffff81029b49>] mce_log+0x29/0x180
The acpi_pad and mei drivers must also be blacklisted as they can cause performance problems on Dell PowerEdge servers. The acpi_pad driver is the ACPI Processor Aggregator driver, used for power saving features. However it can interfere with the firmware if power saving features enabled in BIOS. The acpi_pad driver is still undergoing development and may be best to blacklist the driver at this time, it is not a necessary driver for system operation.
The mei driver can cause error messages with watchdog timer and also prevent proper shutdown.
To blacklist a driver, you can either edit /etc/modprobe.d/blacklist.conf or create your own /etc/blacklist/xxxx.conf file and add the lines:
kernel patch for sb_edac driver:
ACPI PAD driver bugzilla:
Information on mei driver:
To post a comment
login or create an account
Just an update on this...
The patches to address the sb_edac driver are included in the Ubuntu kernel as of Ubuntu-3.2.0-28.44. The latest kernel version in 12.04 is 3.2.0-30.48, so, as long as you are running the latest kernels, you should be in good shape.
The same goes for the acpi_pad bug. The fixes are part of the 3.2.0-29.46 and higher kernels.
For mei, a patch for this issue is currently in the precise-proposed kernel. So far, the patch is looking good for systems that are affected by this bug, especially with regard to reboot hangs. The fix should be included in the next Stable Release Update for 12.04. The bug can be tracked here:
Guys, thanks for the post. I just bought couple of R720s with E5-2690. Running RHEL5.9 . No hba, nothing in PCIe slots. Does anyone know why the time from poweron till linux kernel is loaded on R720 is so much. It looks like it's stuck but it's doing something..memory check is diabled already. Also the temperature on dual cpus stay at 55C