VMware Cloud Community
Elentiya
Contributor
Contributor

Health of Memory changed from red to green. Sensor name: Memory Device 1B 12: Uncorrectable ECC, Current state: Deassert

I have an issue where I am getting constant notification of events as below:

Health of Memory changed from red to green.  Sensor name: Memory Device 1B 12: Uncorrectable ECC, Current state: Deassert (raw value).

I have switched memory, but socket number stayed the same, so have been in contact with server manufacturer, who have replaced the memory - same event, and replaced the motherboard with a faulty motherboard, so they put the original back in.

However looking at the IDRAC and logs etc, there are no errors showing on the hardware device.

I have tried to reset the sensor in the webclient, but this has not fixed the event.

Help please?

Thank you

8 Replies
iopsGent
Enthusiast
Enthusiast

Few things to try;

  • Clear the IPMI SEL logs i
    • Connect to the ESXi host through SSH.
    • Run this command: localcli hardware ipmi sel clear
  • Reset the iDrac
  • Run an extended memory test
Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.
Elentiya
Contributor
Contributor

Hi,

Thanks for your reply.

I ran the command and still have the same 'event', also IDRAC was reset prior to the new motherboard installation, and after the original motherboard installed was  reconfigured again.

The memory is fine, have run extended tests.

As we have 6 hosts, this is the only host the error is showing up on.  All are 6.0 with the same configs/hardware/bios versions.

Thanks again

0 Kudos
iopsGent
Enthusiast
Enthusiast

What other events/errors are you seeing? I'd be pushing for the board to be replaced.

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.
0 Kudos
Elentiya
Contributor
Contributor

Just that on this system.

I'm waiting on a colleague to upgrade to 6.5,  the manufacturer have said it is not a system fault, as there are no error logs on IDRAC TRS, so won't come out to change the board.

0 Kudos
iopsGent
Enthusiast
Enthusiast

Tried reinstalling ESXi?

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.
0 Kudos
Elentiya
Contributor
Contributor

Not at this point, waiting for my colleague to update the software

Elentiya
Contributor
Contributor

As an update, my colleague rebooted the VCenter as he 'broke' it..

After doing so, I re-ran the hardware sensor reset option, and the event warnings have stopped.

I still have some other issues, however, this one has gone.

Thanks again for your help

btown21948
Contributor
Contributor

I had a similar situation. the proposed solution by

vmware log shows the "uncorrectable ecc" message had been appearing every 3 minutes like clockwork.

idrac on host was not indicating any error.