I'm running into a strange issue with VSAN health monitoring, wondering if anyone else has seen it or has any suggestions before I log a support ticket with VMware...
I've recently deployed a four node VSAN cluster using VSAN-approved HPE hardware. Each node has a pair of LAGed 10Gbe NICs dedicated to vSAN traffic.
The VSAN Health Service has been enabled and all health tests are successful until I reboot the vCenter appliance. After rebooting and logging into the web console, I see a failed Task entry for each VSAN node named "Enable agent" with status "Cannot complete the operation. See the event log for more details."
When I check VSAN health under [cluster name] > Monitor > Virtual SAN, I see a failure logged against "ESX Virtual SAN Health service installation" which reports that the agent bundle is inaccessible.
If I turn on Fully Automated DRS for the cluster then click the "Enable" button in Virtual SAN health console, the agents get enabled successfully and VSAN health tests all come back clean. The "Enable" task only runs for 25 seconds, no vMotions happen and no hosts reboot which tells me that the agent is present and functional on each host.
Workaround is functional for now as we are on Eval licensing during deployment but this will not be an possible once the environment is in production as we using vSphere Standard licensing which removes the ability to use DRS.
Has anyone run into this before? Any suggestions on how I can overcome the issue?
What version of esxi & vcenter? I remember seeing this behavior in 6.0. Upgrading to 6.0u1b (and corresponding vcenter) fixed it for me, also working correctly in 6.0u2.
We are running ESXi 6.0.0 build 2494585 and VCVA 6.0.10200. I'm going to open a ticket with VMware to get official feedback, but looks like we will move to 6.0 U1b and see if that corrects the issue.
Thanks for the response.