- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Disable/Remove specific hardware fan alarm
Greetings.
I am looking to disable a false-positive alarm on a single host.
I have one host with a constant Alarm state, on a "Hardware Fan" alert for a non-existent fan!
The server itself has 12 fan slots, but only 5 populated. For some reason, vCenter is alerting on the "missing" fans.
The Alarm is defined at the vCenter level, so I can't disable it on the specific host - nor do I want to disable the entire alarm since I'd then miss an alarm if one of the existing fans really does fail.
Any thoughts on who to address it?
The host in question is an Intel S2600GZ, and the RMM on the host itself is *NOT* complaining of any issues (i.e., it's not just VMware passing on an alarm from the host hardware).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Snowdog_2112,
Not sure of the following - so assuming ESX/ESXi builds etc but need to know
- Exact build of ESX
- Is this a new ESX host or was it working earlier
- If this was working earlier, what changed
- Is the hardware make and model and ESX patch levels same across working/non working hosts?
Try the following and see what works
- Log in to the host and do a services.sh restart
- If that does not work, disconnect the host from VC (do not remove it) and connect it back
- Check in BIOS events and clear BIOS events and see
- Reset the host Hardware Sensors from ESX hardware status and see
- If nothing else works, try a host reboot and/or BIOS upgrade to the latest build
Regards
a
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Exact build of ESX - 5.1.0, 799733
- Is this a new ESX host or was it working earlier - has always reported the alarm
- If this was working earlier, what changed - n/a
- Is the hardware make and model and ESX patch levels same across working/non working hosts? no - this is the only Intel S2600GZ host I have. Most of the other hosts are IBX x3650's and a couple of older Dell PE's in a different vCenter instance (for Dev).
I can clear the alarm and it comes back - not right away, but it always does.
This host has been reformatted and had BIOS updates, and even moved from one vCenter to another vCenter.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Even after moving to different vCenter server you see this problem ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Snowdog_2112,
This pretty much rules out a configuration issue. It is to do with the hardware that is incorrectly reporting fan state to the ESX layer. On the board documentation, is there a requirement that says all fan slots are to be populated? If not, is there a jumper setting or something that needs to be changed? If not, the next logical course of action would be to check with the hardware vendor to see if this is normal behaviour if all fan slots are not being used.
Regards
a
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would suggest check with hardware vendor if they have any fix for it.
I have seen issue with ibm hs22v where esxi reports incorrect bios battery voltage.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will check with Intel, but as I mentioned, the board monitors are "all green" when I look at the RMM using the web interface.
I can see in the RMM which fans are present and which are not, and those in the system are all spinning at the same RPM.
Adding to the confusion, fans #1-6 all have a fan-A and fan-B, and all 5 existing fans have only fan-A present. So again, I am confused as to why VMware would only complain about fan 5, when 1-4 are exactly the same (I can imagine it complaining the fan-B missing on all 5 slots - that would make sense).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ESX essentially takes hardware information from CIM providers that in turn pull up information from the BMC. So it is either the MB/sensors/BMC/CIM providers - in that order you would need to isolate
I don't see any one else using the same hardware having run into this with ESX - and haven't seen related discussions on the Intel forums - ESX or otherwise which again points to something in the specific device/hardware
http://download.intel.com/support/motherboards/server/sb/g24881004_s2600gzgl_tps_r1_1.pdf
You could check the diag lights on the system board to see if that gives you a clue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The answer to this problem should be simple. Disable the fan alarm for this particular host.
Since it is not getting detected correctly, the solution is to disable the alarm, or your host will have the "Check Engine Light" on at all times, blinding the administrator to real issues that may have cropped up outside of the bogus fan alert.
There is no way I see how this can be done for a single host, although I see you can do it at the vcenter level. The attached screen shot shows you how you can edit, and then disable the alarm. It would probably not be wise to disable this particular alarm for all the hosts in the vcenter.
In our case we have a bad sensor, so I have to live with the check engine light, unfortunately. I am guessing I could probably right a script to constantly look for and disable the warning.