I have an issue where I am getting constant notification of events as below:
Health of Memory changed from red to green. Sensor name: Memory Device 1B 12: Uncorrectable ECC, Current state: Deassert (raw value).
I have switched memory, but socket number stayed the same, so have been in contact with server manufacturer, who have replaced the memory - same event, and replaced the motherboard with a faulty motherboard, so they put the original back in.
However looking at the IDRAC and logs etc, there are no errors showing on the hardware device.
I have tried to reset the sensor in the webclient, but this has not fixed the event.
Help please?
Thank you
Few things to try;
Hi,
Thanks for your reply.
I ran the command and still have the same 'event', also IDRAC was reset prior to the new motherboard installation, and after the original motherboard installed was reconfigured again.
The memory is fine, have run extended tests.
As we have 6 hosts, this is the only host the error is showing up on. All are 6.0 with the same configs/hardware/bios versions.
Thanks again
What other events/errors are you seeing? I'd be pushing for the board to be replaced.
Just that on this system.
I'm waiting on a colleague to upgrade to 6.5, the manufacturer have said it is not a system fault, as there are no error logs on IDRAC TRS, so won't come out to change the board.
Tried reinstalling ESXi?
Not at this point, waiting for my colleague to update the software
As an update, my colleague rebooted the VCenter as he 'broke' it..
After doing so, I re-ran the hardware sensor reset option, and the event warnings have stopped.
I still have some other issues, however, this one has gone.
Thanks again for your help
I had a similar situation. the proposed solution by VirtualizedRob worked for me.
vmware log shows the "uncorrectable ecc" message had been appearing every 3 minutes like clockwork.
idrac on host was not indicating any error.
a) rebooted vcenter
b) uploaded latest firmware on idrac and restarted it
c) did on host: localcli hardware ipmi sel clear
Now no more "uncorrectable ecc" messages; last one was over 30 min ago.
Did not have to reboot host or guest vm.