VMware Cloud Community
goppi
Enthusiast
Enthusiast
Jump to solution

System Board 8 Memory - Uncorrectable ECC

We setup ESXi 4.1 with latest patches applied on a brand new HP DL380 G7 with latest FW and latest ESXi Offline Bundle, which shows the ECC problem you can see from the attached screenshot.

We opened a case at HP and they told us that none of the HP diganostics (IML + Survey) shows any problems at all. We also changed memory modules on bank 8 which didn't change anything. HP said that this seems to be a problem of ESXi displaying wrong information.

Is there any known problem with ESXi 4.1 showing invalid information?

Do you have any suggestions?

Thanks.

Reply
0 Kudos
29 Replies
AHutch
Contributor
Contributor
Jump to solution

I believe this can safely be ignored see post below:

SUPPORT COMMUNICATION - CUSTOMER NOTICE

Document ID: c03478508

Version: 1

Notice: VMware ESXi 5.0 - Physical Memory in ProLiant Server Platforms Is Reported as "Deassert" Within the Hardware Tab of VMware vCenter
NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

Release Date: 2012-09-04

Last Updated: 2012-09-04


DESCRIPTION

Physical memory used in ProLiant server platforms is reported as "Deassert" on the Hardware tab of VMware vCenter. Under the Details column, the memory modules are reported as "Current State: Deassert."

DETAILS

The information to populate the "Current State: Deassert" field is obtained from the standard IPMI Memory sensor supported by ProLiant servers.

The following messages are reported in VMware vCenter.

System Board 8 Memory Status: Uncorrectable ECC Current State:Deassert

System Board 8 Memory Status: Correctable ECC logging limit reached Current State:Deassert

The reporting of these physical memory messages as "Current State Deassert" on the Hardware tab can be safely ignored and does not indicate that there is reason to take any action. When a "Correctable ECC logging limit reached" or an "Uncorrectable ECC" condition occurs on any DIMM in the server, this sensor will report the appropriate sensor as "asserted" and an entry will be logged into the System Event Log (SEL) and the HP ProLiant Integrated Management Log (IML).

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=115&prodSeriesId=...

Reply
0 Kudos
Jeffwitt
Contributor
Contributor
Jump to solution

All,

I realize this is an old post I'm bringing back up but it matches exactly a problem I have.

I understand that HP indicates it is not a problem and VMware says it is safe to ignore it. My issue is How do I reset the red indicator? This is kind of like shutting off the check engine light. Sure I can safely ignore it but other people also look at Virtual Center and ask "Whats this red indicator doing on looks like we have a problem" Then I have to prove to them it is not a problem.

On the server itself I cleared the IML I also reset the sensors in vCenter. Yet it still indicates there was a problem in VMware.

Regards,

Jeff

Reply
0 Kudos
rbos12
Contributor
Contributor
Jump to solution

We are experiencing the same issues.

I've upgrade the ILO Firmwares to 1.70 on DL380 G7 (ilo3) and 1.50 in DL380 G8 (ilo4) and after that 3 hosts show errors on memory within vmware

System Board 8 Memory: Uncorrectable ECC Current State Assert

Edit:  We use Vmware 5.5.0

Reply
0 Kudos
Howiedog
Enthusiast
Enthusiast
Jump to solution

Same here as well. Vmware 5.1.0 (1483097)

I also upgraded iLo4 the other day from 1.3 > 1.5. Decided today just to check everything and lo and behold, all my DL380p have this same alert.

System Board 8 Memory: Uncorrectable ECC           "Alert"        Current State:Assert

Reply
0 Kudos
rbos12
Contributor
Contributor
Jump to solution

I Migrated all the vm's to another host and rebooted the host , did other firmwareupdates and after that the reset sensors did the trick, all is fine now.

Reply
0 Kudos
goppi
Enthusiast
Enthusiast
Jump to solution

Same here on VMware 5.1

We are on ILO 1.32 for DL380p gen8

Interrestingly nothing shows up in IML.

What version of CIM providers the guys are running seeing this issue?

Reply
0 Kudos
Jeffwitt
Contributor
Contributor
Jump to solution

Mine has also disappeared after a few days of constantly showing up. I wonder if it didn't just need to wait for a little while to see that the IML had been cleared.

Reply
0 Kudos
goppi
Enthusiast
Enthusiast
Jump to solution

But did it show up in IML?

I still suspect this is a false positive.

Normally uncorrectable ECC errors are also written to SPD of the DIMM module peristently.

So there should be no chance to get rid of this message if it once appears and is no false alarm.

Reply
0 Kudos
Jeffwitt
Contributor
Contributor
Jump to solution

There were issues with the memory. I replaced the faulty dimms before installing ESX so it was not a false positive just lingered for awhile before clearing up.

Reply
0 Kudos
dalexiev
Enthusiast
Enthusiast
Jump to solution

In my case there was no error with hardware. ESXi 5.5 2456374.

After clearing all logs on iLo and resetting sensors on the host, I dropped down the Sensors - to - System event log (SEL) and cleared SEL too. Yellow bang is now gone.

hp system board 5 memory status.PNG

Reply
0 Kudos