We setup ESXi 4.1 with latest patches applied on a brand new HP DL380 G7 with latest FW and latest ESXi Offline Bundle, which shows the ECC problem you can see from the attached screenshot.
We opened a case at HP and they told us that none of the HP diganostics (IML + Survey) shows any problems at all. We also changed memory modules on bank 8 which didn't change anything. HP said that this seems to be a problem of ESXi displaying wrong information.
Is there any known problem with ESXi 4.1 showing invalid information?
Do you have any suggestions?
I believe this can safely be ignored see post below:
Document ID: c03478508
Release Date: 2012-09-04
Last Updated: 2012-09-04
Physical memory used in ProLiant server platforms is reported as "Deassert" on the Hardware tab of VMware vCenter. Under the Details column, the memory modules are reported as "Current State: Deassert."
The information to populate the "Current State: Deassert" field is obtained from the standard IPMI Memory sensor supported by ProLiant servers.
The following messages are reported in VMware vCenter.
System Board 8 Memory Status: Uncorrectable ECC Current State:Deassert
System Board 8 Memory Status: Correctable ECC logging limit reached Current State:Deassert
The reporting of these physical memory messages as "Current State Deassert" on the Hardware tab can be safely ignored and does not indicate that there is reason to take any action. When a "Correctable ECC logging limit reached" or an "Uncorrectable ECC" condition occurs on any DIMM in the server, this sensor will report the appropriate sensor as "asserted" and an entry will be logged into the System Event Log (SEL) and the HP ProLiant Integrated Management Log (IML).
I realize this is an old post I'm bringing back up but it matches exactly a problem I have.
I understand that HP indicates it is not a problem and VMware says it is safe to ignore it. My issue is How do I reset the red indicator? This is kind of like shutting off the check engine light. Sure I can safely ignore it but other people also look at Virtual Center and ask "Whats this red indicator doing on looks like we have a problem" Then I have to prove to them it is not a problem.
On the server itself I cleared the IML I also reset the sensors in vCenter. Yet it still indicates there was a problem in VMware.
We are experiencing the same issues.
I've upgrade the ILO Firmwares to 1.70 on DL380 G7 (ilo3) and 1.50 in DL380 G8 (ilo4) and after that 3 hosts show errors on memory within vmware
System Board 8 Memory: Uncorrectable ECC Current State Assert
Edit: We use Vmware 5.5.0
Same here as well. Vmware 5.1.0 (1483097)
I also upgraded iLo4 the other day from 1.3 > 1.5. Decided today just to check everything and lo and behold, all my DL380p have this same alert.
System Board 8 Memory: Uncorrectable ECC "Alert" Current State:Assert
Mine has also disappeared after a few days of constantly showing up. I wonder if it didn't just need to wait for a little while to see that the IML had been cleared.
But did it show up in IML?
I still suspect this is a false positive.
Normally uncorrectable ECC errors are also written to SPD of the DIMM module peristently.
So there should be no chance to get rid of this message if it once appears and is no false alarm.
In my case there was no error with hardware. ESXi 5.5 2456374.
After clearing all logs on iLo and resetting sensors on the host, I dropped down the Sensors - to - System event log (SEL) and cleared SEL too. Yellow bang is now gone.