VMware Cloud Community
pmaduda
Contributor
Contributor

VSAN clusters shutdown problem

I have experienced error when trying to shutdown vSAN Clusters in LAB envrionmnent(7.0.2)

I have 2 VSAN clusters - one 3node cluster, one 2node+witness appliance.

I followed this procedure https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vsan-monitoring.doc/GUID-31B4F958-3...

Howewer, when trying to run /usr/lib/vmware/vsan/bin/reboot_helper.py prepare script, both clusters fail with "unsupported host..."

First cluster fails stating that esxi host(FQDN is written in error) from SECOND cluster is not supported and should be removed before proceeding. I have tried checking vsan cluster members(vsan cluster getesxcli vsan cluster unicastagent list) and only correct members are here, so I don't get why there is the mismatch with host from another cluster.

Second 2node cluster fails stating that the other esxi host(non-witness) is unsupported and should be removed, I have tried running script from both non-witness nodes and both state that the other one is correct. When checking vsan health and cluster membership everything seems to be correct.

I'm quite afraid because this lab was built to simulate bahaviour of production VSAN clusters that we would need to shutdown and physically move to another Datacenter.

 

Tags (1)
0 Kudos
3 Replies
TheBobkin
Champion
Champion

@pmaduda 

"both clusters fail with "unsupported host..." " - Is this a nested-ESXi cluster? If so this could be failing due to guardrails preventing running this on Witnesses (e.g. detecting 'VMware' as vendor of storage controllers etc.).

 

"First cluster fails stating that esxi host(FQDN is written in error) from SECOND cluster is not supported and should be removed before proceeding."
Can you please share/check the output of in case there are leftover entries from you using the same names or moving hosts between clusters?:
# cmmds-tool find -t HOSTNAME

 

If nested-ESXi, did you by any chance clone the hosts/VMs or their data and now multiple nodes have the same UUIDs?

"I'm quite afraid because this lab was built to simulate behaviour of production VSAN clusters that we would need to shutdown and physically move to another Datacenter."

 

Please confirm whether these are physical nodes and supported server models for the version of ESXi and vSAN-HCL components. Otherwise, sorry but this is fairly moot point as virtual ESXi doesn't always react the same as a proper cluster does and this can be comparing apples to oranges in some cases.

0 Kudos
pmaduda
Contributor
Contributor

Both clusters run on physical hosts(except the witness appliance), I´ve tried running ESXi Compatibility Checker and checking VSAN HCL and everything seems to be supported.

I've ran cmmds-tool find -t HOSTNAME(on each host to be sure) and there are no leftover entries, only the correct cluster members.

0 Kudos
TheBobkin
Champion
Champion

@pmaduda, Wondering is there anything else potentially oddly configured here:

 

Are both vSAN clusters (or their Witness networks if using WTS) in the same subnets and/or VLANs as one another?

 

How were the Witnesses deployed? Did you by any chance clone the Witness VM and use a clone for the second cluster?

 

Do the clusters have unique sub-cluster UUIDs? Long-shot but same goes for the data-node UUIDs ( asking as I once saw a bad situation from a guy cloning SD cards which results in any hosts using these having the same UUID 😬 )

 

What did you mean by "FQDN is written in error"? Can you validate the nslookup for each host and validate that everything is resolved correctly?

0 Kudos