VMware Cloud Community
dgingeri
Enthusiast
Enthusiast
Jump to solution

False hardware alarms

I have a pair of Cisco UCS C240s for hosts at work, and one of them keeps triggering alarms in vCenter 5.5 Essentials Plus.  Most of the time, it says that a fan is not functioning, and yet when I go into the hardware status, it doesn't even have fans listed.  In the hardware status, it says that there are 2 warnings for the processor, one for processor 1 showing "P_CATERR_N" and the second for processor 2 showing "P2_MEM01_MEMHOT".  When I go into the management page for the UCS unit, there are no hardware alarms at all, and the "predictive catastrophic error" flag is not triggered at all, and the memory all shows perfectly normal temps.  I trust Cisco's own page more than I trust vCenter's.  How do I turn off the hardware status alarms in vCenter to make this garbage stop?

0 Kudos
1 Solution

Accepted Solutions
rsk007
Enthusiast
Enthusiast
Jump to solution

1. If you would like to turn off the hardware alarms, you can turn off from the vCenter server only.

2. "P2_MEM01_MEMHOT"

There is Cisco bug: CSCve10234 on C-series servers throwing temperature alers.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCve10234/?referring_site=bugquickviewredir

Workaround:

The problem is not a temperature issue. Rather it is something else and this is a generic message being thrown after a CPU is hung due to some sort of interop issue

this issue need to check drivers and firmware Interoperability, OS, etc, troubleshooting.

If you found my answers helpful please consider marking them as helpful or correct.

Santhosh Ranga
LinkedIn: https://www.linkedin.com/in/santhosh-ranga-43a88b124/

View solution in original post

0 Kudos
2 Replies
rsk007
Enthusiast
Enthusiast
Jump to solution

1. If you would like to turn off the hardware alarms, you can turn off from the vCenter server only.

2. "P2_MEM01_MEMHOT"

There is Cisco bug: CSCve10234 on C-series servers throwing temperature alers.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCve10234/?referring_site=bugquickviewredir

Workaround:

The problem is not a temperature issue. Rather it is something else and this is a generic message being thrown after a CPU is hung due to some sort of interop issue

this issue need to check drivers and firmware Interoperability, OS, etc, troubleshooting.

If you found my answers helpful please consider marking them as helpful or correct.

Santhosh Ranga
LinkedIn: https://www.linkedin.com/in/santhosh-ranga-43a88b124/
0 Kudos
dgingeri
Enthusiast
Enthusiast
Jump to solution

I got the warnings to go away by clearing the event log on the Cisco management panel.  It was misinterpreting messages about a DIMM that went bad years before I got here.  However, it still keeps throwing out an alarm about the fan status that is totally false.  I turned off that alarm at the vcenter level.  I hadn't thought to look there.  Thanks.  There are only 2 servers on that, and both are the same and have their own hardware monitoring, so that shouldn't be an issue. 

0 Kudos