snowdog_2112
Enthusiast
Enthusiast

Disable/Remove specific hardware fan alarm

Greetings.

I am looking to disable a false-positive alarm on a single host.

I have one host with a constant Alarm state, on a "Hardware Fan" alert for a non-existent fan!

The server itself has 12 fan slots, but only 5 populated.  For some reason, vCenter is alerting on the "missing" fans.

The Alarm is defined at the vCenter level, so I can't disable it on the specific host - nor do I want to disable the entire alarm since I'd then miss an alarm if one of the existing fans really does fail.

Any thoughts on who to address it?

The host in question is an Intel S2600GZ, and the RMM on the host itself is *NOT* complaining of any issues (i.e., it's not just VMware passing on an alarm from the host hardware).

0 Kudos
8 Replies
a_nut_in
Expert
Expert

Hi Snowdog_2112,

Not sure of the following - so assuming ESX/ESXi builds etc but need to know

  • Exact build of ESX
  • Is this a new ESX host or was it working earlier
  • If this was working earlier, what changed
  • Is the hardware make and model and ESX patch levels same across working/non working hosts?

Try the following and see what works

  1. Log in to the host and do a services.sh restart
  2. If that does not work, disconnect the host from VC (do not remove it) and connect it back
  3. Check in BIOS events and clear BIOS events and see
  4. Reset the host Hardware Sensors from ESX hardware status and see
  5. If nothing else works, try a host reboot and/or BIOS upgrade to the latest build

Regards

a

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
0 Kudos
snowdog_2112
Enthusiast
Enthusiast

  • Exact build of ESX - 5.1.0, 799733
  • Is this a new ESX host or was it working earlier - has always reported the alarm
  • If this was working earlier, what changed - n/a
  • Is the hardware make and model and ESX patch levels same across working/non working hosts? no - this is the only Intel S2600GZ host I have.  Most of the other hosts are IBX x3650's and a couple of older Dell PE's in a different vCenter instance (for Dev).

I can clear the alarm and it comes back - not right away, but it always does.

This host has been reformatted and had BIOS updates, and even moved from one vCenter to another vCenter.

0 Kudos
admin
Immortal
Immortal

Even after moving to different vCenter server you see this problem ?

a_nut_in
Expert
Expert

Hey Snowdog_2112,

This pretty much rules out a configuration issue. It is to do with the hardware that is incorrectly reporting fan state to the ESX layer. On the board documentation, is there a requirement that says all fan slots are to be populated? If not, is there a jumper setting or something that needs to be changed? If not, the next logical course of action would be to check with the hardware vendor to see if this is normal behaviour if all fan slots are not being used.

Regards

a

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
0 Kudos
vHaridas
Expert
Expert

I would suggest check with hardware vendor if they have any fix for it.

I have seen issue with ibm hs22v where esxi reports incorrect bios battery voltage.

Thanks

Please consider awarding points for "Correct" or "Helpful" replies. Thanks....!!! https://vprhlabs.blogspot.in/
0 Kudos
snowdog_2112
Enthusiast
Enthusiast

I will check with Intel, but as I mentioned, the board monitors are "all green" when I look at the RMM using the web interface.

I can see in the RMM which fans are present and which are not, and those in the system are all spinning at the same RPM.

Adding to the confusion, fans #1-6 all have a fan-A and fan-B, and all 5 existing fans have only fan-A present.  So again, I am confused as to why VMware would only complain about fan 5, when 1-4 are exactly the same (I can imagine it complaining the fan-B missing on all 5 slots - that would make sense).

0 Kudos
a_nut_in
Expert
Expert

ESX essentially takes hardware information from CIM providers that in turn pull up information from the BMC. So it is either the MB/sensors/BMC/CIM providers - in that order you would need to isolate

I don't see any one else using the same hardware having run into this with ESX - and haven't seen related discussions on the Intel forums - ESX or otherwise which again points to something in the specific device/hardware

http://download.intel.com/support/motherboards/server/sb/g24881004_s2600gzgl_tps_r1_1.pdf

You could check the diag lights on the system board to see if that gives you a clue

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
0 Kudos
kjdfhaueiase
Enthusiast
Enthusiast

The answer to this problem should be simple. Disable the fan alarm for this particular host.

Since it is not getting detected correctly, the solution is to disable the alarm, or your host will have the "Check Engine Light" on at all times, blinding the administrator to real issues that may have cropped up outside of the bogus fan alert.

There is no way I see how this can be done for a single host, although I see you can do it at the vcenter level. The attached screen shot shows you how you can edit, and then disable the alarm. It would probably not be wise to disable this particular alarm for all the hosts in the vcenter.

In our case we have a bad sensor, so I have to live with the check engine light, unfortunately. I am guessing I could probably right a script to constantly look for and disable the warning.

CheckEngineLightDisable.JPG

0 Kudos