VMware Cloud Community
MBrownWFP
Enthusiast
Enthusiast

VSAN health tests fail after reboot of VC appliance

I'm running into a strange issue with VSAN health monitoring, wondering if anyone else has seen it or has any suggestions before I log a support ticket with VMware...

I've recently deployed a four node VSAN cluster using VSAN-approved HPE hardware. Each node has a pair of LAGed 10Gbe NICs dedicated to vSAN traffic.

The VSAN Health Service has been enabled and all health tests are successful until I reboot the vCenter appliance. After rebooting and logging into the web console, I see a failed Task entry for each VSAN node named "Enable agent" with status "Cannot complete the operation. See the event log for more details."

VSAN-health_enable-agent-failed.jpg

When I check VSAN health under [cluster name] > Monitor > Virtual SAN, I see a failure logged against "ESX Virtual SAN Health service installation" which reports that the agent bundle is inaccessible.

VSAN-health_monitoring-error.jpg

If I turn on Fully Automated DRS for the cluster then click the "Enable" button in Virtual SAN health console, the agents get enabled successfully and VSAN health tests all come back clean. The "Enable" task only runs for 25 seconds, no vMotions happen and no hosts reboot which tells me that the agent is present and functional on each host.


Workaround is functional for now as we are on Eval licensing during deployment but this will not be an possible once the environment is in production as we using vSphere Standard licensing which removes the ability to use DRS.

Has anyone run into this before? Any suggestions on how I can overcome the issue?

0 Kudos
2 Replies
elerium
Hot Shot
Hot Shot

What version of esxi & vcenter? I remember seeing this behavior in 6.0. Upgrading to 6.0u1b (and corresponding vcenter) fixed it for me, also working correctly in 6.0u2.

0 Kudos
MBrownWFP
Enthusiast
Enthusiast

We are running ESXi 6.0.0 build 2494585 and VCVA 6.0.10200. I'm going to open a ticket with VMware to get official feedback, but looks like we will move to 6.0 U1b and see if that corrects the issue.

Thanks for the response.

0 Kudos