vSan cluster running 6.7U1 on cisco ucs rack mounted.
As the subject of this discussion says. In my environment the hardware compatibility health check for the Controller Firmware (FW) is in a green state even though vSAN is unable to verify the running version. I know the reason why vsan cant detect the version running (missing controller utility), my point is that the health check should not be green but yellow. In our case this led to the false assumption of a healthy cluster which resulted in multiple controller failures due to a possible incompatible/faulty FW.
PS: I dont recall silencing an alarm for this, if I did and vsan would in fact show an alarm if it cant verify the controller FW running on the host, please ignore this question.
Thanks in advance.
If Health check can't get information from the controller due to lacking the necessary 3rd-party tools then it obviously isn't going to be able to run checks against them (other than maybe 'controller utility not installed' check).
It should have been fairly clear from the fact that the green check showed none of the controllers that they were not included in the check.
I do concur though that if there is storage configured but zero controllers in health something should inform of this (NVMe also show here so regardless of implementation type there should be a controller/NVMe).
TheBobkin, I dont know about you but if I have a green check on a health check, my logic tells me I can trust that the health check has actually verified the version of fw installed.
This health check is very straight forward as its name states "Controller firmware is VMware certified" (in my case it is green even though it cant show the fw version).
My point is, if you cant verify the firmware by Whatever reason... the check should not pass, period. You are not checking whether the tool you need to verify the fw version is installed or not.... at least that is not what the name implies...
In my head and based on actual hw failures experienced due to possibly having the wrong fw installed, i cant trust these health checks now...
A picture is worth a 1000 words... To show you what i have been trying to convey..i dont think the highlighted checks should be green (i know for a fact that the hosts are not running the recommended version). It is obvious that vmware was unable to verify the current firmware on the controllers why should the check pass then...
TheBobkin If you click on the image i uploaded you will see that the firmware version isnt verified on any host, this portion comes up when you click on the health check named Controller firmware is VMware certified which as you can see is showing as passed.
Can you tell me why would you have it green even though vsan isnt able to see what is actually running on the controller.
Again, what i am trying to say is that the health check named Controller firmware is VMware Certified shows as passed even though vsan cant even see the firmware on the controller.
Unless this is checks for something else, it is misleading... by having it as passed you are telling me my Controller firmware IS VMware Certified even though you cant see it. If you are telling me that i should have drilled down and verified that it wasnt showing as N/A then the green check mark is useless and misleading...
To me is clear as mud .
Thanks for your quick replies.
Didn't see your full screenshot as on mobile and thus removed comment asking for info you already shared.
I agree, that shouldn't be green if not validated, I would advise opening a Support Request with vSAN GSS to get this looked at further (can't do this personally at present as won't be in office for 2 weeks).
Opened a case. Explained the goal was not to install the storage utility to get vsan to recognize the the controller firmware but to fix a health check given false positives. Case got escalated and yesterday had a call were they ack'd the issue and i was suggested to open a feature request case. I was told that they have seen this issue before we talked about this KB (https://kb.vmware.com/s/article/2148867), which doesn't apply to me. I never saw such a warning nor could we find this health check under the monitoring section of any of the 2 vcenter management interfaces. Anyways, is feature request the way to go? Isn't there another type of case like problem report or something similar. This is definitely an issue and not a "feature" request.
At this point I am just curious about how many customers might be experiencing this false positives and are assuming everything is good when it is not. Based on what i have read the storage controller utility doesnt even come as part of custom ESXi images (it was not in the Cisco image i used and read another case for a Fujitsu server image that didnt have either).
Confirm as I had also to install the perccli vendor tool for Dell to be able to read the firmware version.
Fully agree, if you can’t check something you cannot default to Pass\Green (irrespective of why)