Solved: Re: vsan 6.6 insufficient vsphere ha failover reso...

time81 · ‎08-01-2017

Hi,

we are testing a brand new 6.6 vsan but my cluster instantly turns to: insufficient vsphere ha failover resources

even if all 3 nodes (dell r730xd with 2x10 core) are empty.

reconfigure for HA didnt help.

turn HA on/off didnt help

admission controll is turned off, but even if i say 1 host can fail or it should reserve 25% cpu resources if still gives the error ?

any suggestions ?

TheBobkin · ‎08-03-2017

Hello time,

You can identify 'who' it thinks are the 4 cluster members from any host using this (I populated it based on the sub-cluster membership UUIDs):

# cmmds-tool find -t HOSTNAME -f json -u 597f05ca-7472-cc68-de1c-a0369fd8e08c |grep content

# cmmds-tool find -t HOSTNAME -f json -u 597f1c09-d372-daff-b609-a0369fcc4db4 |grep content

# cmmds-tool find -t HOSTNAME -f json -u 597f23c4-8c11-3784-1cc7-a0369fcc5184 |grep content

# cmmds-tool find -t HOSTNAME -f json -u 59805405-6568-a8c0-5277-000c299d2d36 |grep content

Once you have identified the member who should not be present, try removing it from the cluster. If the member is the Witness that is already gone then it seems it was not decommissioned properly, in this case (if removal of it is not possible) creating a new cluster and adding nodes to it is the best option. How simple this will be depends on if you have data on here and/or can move it all off/backup-restore.

Bob

View solution in original post

jameseydoyle · ‎08-02-2017

Hi,

It would be important to verify that the HA cluster is forming correctly. Can you check Monitor > vSphere HA > Summary and ensure that you have 1 master and 2 hosts connected to the Master. Have you also got your vSAN network properly configured? HA uses the vSAN-enabled VMkernel port for HA traffic in clusters where vSAN is enabled, so ensure there are no issues with the vSAN configuration. Check Monitor > vSAN > Health and ensure that all Network tests are passing.

TheBobkin · ‎08-02-2017

Hello time,

Check the current cluster membership from a host and make sure all nodes are correctly clustered:

#esxcli vsan cluster get

Check that no hosts are in vSAN Maintenance Mode (should be all "decomState": 0:

# cmmds-tool find -t NODE_DECOM_STATE -f json

Bob

time81 · ‎08-02-2017

All decom State 0

Cluster Information

Enabled: true

Current Local Time: 2017-08-02T14:02:27Z

Local Node UUID: 597f1c09-d372-daff-b609-a0369fcc4db4

Local Node Type: NORMAL

Local Node State: MASTER

Local Node Health State: HEALTHY

Sub-Cluster Master UUID: 597f1c09-d372-daff-b609-a0369fcc4db4

Sub-Cluster Backup UUID: 597f05ca-7472-cc68-de1c-a0369fd8e08c

Sub-Cluster UUID: 52c9da94-4b46-c911-8976-e2888f0c1bde

Sub-Cluster Membership Entry Revision: 2

Sub-Cluster Member Count: 4

Sub-Cluster Member UUIDs: 597f05ca-7472-cc68-de1c-a0369fd8e08c, 597f1c09-d372-daff-b609-a0369fcc4db4, 597f23c4-8c11-3784-1cc7-a0369fcc5184, 59805405-6568-a8c0-5277-000c299d2d36

Sub-Cluster Membership UUID: c9718059-3d6e-187e-d9cd-a0369fcc4db4

Unicast Mode Enabled: true

Maintenance Mode State: OFF

Config Generation: 11fcd4eb-d081-48e9-8158-157ab295dc95 5 2017-08-01T11:51:46.716

TheBobkin · ‎08-02-2017

Hello time,

I thought this was a 3-node cluster?

Sub-Cluster Member Count: 4

Bob

jameseydoyle · ‎08-02-2017

Can you send a summary of the HA Cluster status?

You could send the contents from the /opt/vmware/fdm/fdm/hostlist file from any of the hosts.

time81 · ‎08-02-2017

Well it is 3 ! See the pic below. Does it have to do with the witness host ? (VMWare Witness appliance 6.5)

We had a 2 node cluster before, with the 3rd physical host selected as witness but we destroyed it and i had to delete all the partitions and rebootet the esxi.

I cant find the hostlist file

[root@:/opt/vmware/fdm/fdm] ls -la

total 25380

drwxr-xr-x 1 root root 512 Aug 1 10:02 .

drwxr-xr-x 1 root root 512 Aug 1 10:02 ..

-r-xr-xr-x 1 root root 22846120 Jul 7 14:47 fdm

-r-xr-xr-x 1 root root 649 Jul 7 14:50 fdm-dump.sh

-r--r--r-- 1 root root 2174892 Jul 7 14:50 libcrypto.so.1.0.2

-r--r--r-- 1 root root 398060 Jul 7 14:50 libssl.so.1.0.2

-r-xr-xr-x 1 root root 963 Jul 7 14:50 prettyPrint.sh

-r-xr-xr-x 1 root root 502488 Jul 7 14:47 readCompressed

-r-xr-xr-x 1 root root 40057 Jul 7 14:50 vpxResultFilter.xml

-r-xr-xr-x 1 root root 1544 Jul 7 14:50 xmlpp.py

jameseydoyle · ‎08-03-2017

I apologise, I gave you the wrong filepath:

/etc/opt/vmware/fdm/hostlist

is the correct location.

However, it does appear that the cluster is expecting 4 hosts as vSAN lists 4 hosts. This means your cluster is always one host down!

The cleanest way to get out of the scenario would be to create new cluster. Remove all the diskgroups from the hosts and add the hosts to the new cluster.

Next time you need to remove a Witness Host, ensure to disable the Stretched Cluster configuration first and remove the host cleanly.

TheBobkin · ‎08-03-2017

Hello time,

You can identify 'who' it thinks are the 4 cluster members from any host using this (I populated it based on the sub-cluster membership UUIDs):

# cmmds-tool find -t HOSTNAME -f json -u 597f05ca-7472-cc68-de1c-a0369fd8e08c |grep content

# cmmds-tool find -t HOSTNAME -f json -u 597f1c09-d372-daff-b609-a0369fcc4db4 |grep content

# cmmds-tool find -t HOSTNAME -f json -u 597f23c4-8c11-3784-1cc7-a0369fcc5184 |grep content

# cmmds-tool find -t HOSTNAME -f json -u 59805405-6568-a8c0-5277-000c299d2d36 |grep content

Once you have identified the member who should not be present, try removing it from the cluster. If the member is the Witness that is already gone then it seems it was not decommissioned properly, in this case (if removal of it is not possible) creating a new cluster and adding nodes to it is the best option. How simple this will be depends on if you have data on here and/or can move it all off/backup-restore.

Bob

time81 · ‎08-03-2017

Thanks guys.

Re-Created it with 3 hosts in 1 fault domain without witness. its showing 3 now and the error is gone.

All

vsan 6.6 insufficient vsphere ha failover resources