Please help with a solution of this problem. Several times within a few weeks server becomes unavailable. The error is:Memory/Battery problems were detected. The adapter has recovered, but cached data was lost. Press any key to continue.I did a memory test using a tool Memory Diagnostics Tool. No errors were found.It seems that the operating system is running after boot without any errors. Data loss is not detected.How critical is this error and is it possible to make that the server is just reboots and did not go into a state of inaccessibility?
Sounds like you may need to replace the battery and/or memory for the raid controller in the server
You can run the service tag of your server to find out if you have the raid kit or if it is a raid controller card and get the part numbers you need.
The data could be at risk for becoming corrupt without replacing the parts.
The error is related to the RAID memory/battery, not the system memory/CMOS battery.
Is the LCD panel (that is normally blue, scrolling the model number or custom text) blue or amber? If it is amber, what message is it scrolling? Does it happen on every reboot or cold boot? Has the system been unplugged for any length of time? Do you have OpenManage installed, and have you checked the Hardware Logs? What is your controller firmware at? BIOS? Sorry for all the questions ... this can be several things and can cause several other problems in the process.
Most likely, in order of probability ...
RAID memory is bad.RAID battery is bad.Riser card is bad.Motherboard is bad.Controller firmware needs to be updated.
The only way to make the system reboot to a functional state (I.e. loading Windows), is to fix the hardware error. This could potentially bring the server down permanently, so it stops the system during POST to make sure an actual person is aware that there is a problem that needs to be addressed.
Thank you for your answer.
The LCD Panel is blue and without any error message. Server reboots itself and it usually happens at night. But not every night. One or two times per week.
No, system was not unplugged. Open Manage installed and this errors are in Hardware logs:
Severity Date and Time Description Mon Oct 11 20:35:40 2010 Log cleared Thu Oct 14 22:05:58 2010 ECC Single Bit Fault detected - Bank 2, DIMM B Thu Oct 21 16:25:35 2010 Power Supply 1 power supply sensor Power lost Thu Oct 21 16:25:46 2010 Power Supply 1 power supply sensor returned to normal state
But there is nothing about two or three last reboot. And nothing in the Event logs. Only about unexpected shutdown.
About server FW and BIOS.
Phoenix ROM BIOS PLUS Version 1.10 A14
Adaptec SCSI BIOS v3.10.0
PowerEdge Expandable RAID Controller BIOS 1.07 Jul 22, 2004
Embedded server management firmware revision 1.84
Primary backplane firmware revision 1.01
If you get that message after the server has unexpectedly rebooted, it would not be an uncommon error. All it is saying is that there was data in the RAID's cache that did not get written and the controller does not know where to send it so it is going to flush it.If all the diagnostics are passing on the server and the error does not happen during a normal reboot cycle, then it is likely because of the unexpected shut downs, which may have something to do with the OS.However, based on the hardware logs, I would run a memory test after clearing them to see if there is a problem with that module that is reporting an error. You may even want to loop the memory test a few times as well.