Contributor
Contributor

Cluster membership lost when adding storage

Jump to solution

I have 3 ESXi 6.7.0u3 hosts and trying to make vsan cluster. After configuring networking, and start setting up cluster on first host:

esxcli vsan cluster new

result:

[root@srv1:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2020-09-02T11:46:22Z

   Local Node UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8

   Sub-Cluster Backup UUID:

   Sub-Cluster UUID: 52cffc95-a360-d292-7279-b6304d366ae5

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member Count: 1

on the other 2 hosts:

esxcli vsan cluster join -u 52cffc95-a360-d292-7279-b6304d366ae5

result cluster has 3 members:

[root@srv1:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2020-09-02T11:46:49Z

   Local Node UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8

   Sub-Cluster Backup UUID: 5f44f8e2-02bd-46f4-1db0-9440c92e6e78

   Sub-Cluster UUID: 52cffc95-a360-d292-7279-b6304d366ae5

   Sub-Cluster Membership Entry Revision: 2

   Sub-Cluster Member Count: 3

   Sub-Cluster Member UUIDs: 5f4630de-4f35-4760-fd8f-9440c92e6eb8, 5f44f8e2-02bd-46f4-1db0-9440c92e6e78, 5f44fa76-a1d9-efe6-d13d-9440c931d754

As soon as I add storage, it doesn't matter on which of the 3 hosts, the host is not in the cluster anymore:

[root@srv1:~] esxcli vsan storage add -s mpx.vmhba0:C0:T64:L0 -d mpx.vmhba0:C0:T66:L0 -d mpx.vmhba0:C0:T67:L0 -d mpx.vmhba0:C0:T68:L0 -d mpx.vmhba0:C0:T69:L0

[root@srv1:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2020-09-02T11:49:22Z

   Local Node UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8

   Sub-Cluster Backup UUID:

   Sub-Cluster UUID: 52cffc95-a360-d292-7279-b6304d366ae5

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member Count: 1

   Sub-Cluster Member UUIDs: 5f4630de-4f35-4760-fd8f-9440c92e6eb8

   Sub-Cluster Member HostNames: srv1

   Sub-Cluster Membership UUID: bb864f5f-20eb-225c-f795-9440c92e6eb8

   Unicast Mode Enabled: true

   Maintenance Mode State: OFF

   Config Generation: None 0 0.0

Status of the other 2 remaining hosts:

[root@srv2:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2020-09-02T12:01:39Z

   Local Node UUID: 5f44f8e2-02bd-46f4-1db0-9440c92e6e78

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 5f44f8e2-02bd-46f4-1db0-9440c92e6e78

   Sub-Cluster Backup UUID: 5f44fa76-a1d9-efe6-d13d-9440c931d754

   Sub-Cluster UUID: 52cffc95-a360-d292-7279-b6304d366ae5

   Sub-Cluster Membership Entry Revision: 2

   Sub-Cluster Member Count: 2

   Sub-Cluster Member UUIDs: 5f44f8e2-02bd-46f4-1db0-9440c92e6e78, 5f44fa76-a1d9-efe6-d13d-9440c931d754

   Sub-Cluster Member HostNames: srv2, srv3

   Sub-Cluster Membership UUID: 26864f5f-a0bd-8377-dbbb-9440c92e6e78

   Unicast Mode Enabled: false

   Maintenance Mode State: OFF

   Config Generation: None 0 0.0

Any clues what causes this, and how to solve?

1 Solution

Accepted Solutions
VMware Employee
VMware Employee

Hello jkjr​,

Welcome to Communities (and vSAN).

Lalegre, yes the cluster is partition but not due to network communication as OP already shown that the cluster was fully-formed (that wouldn't be possible if there was no network communication).

jkjr​, is there a specific reason you are trying to configure vSAN cluster via the CLI as opposed to the proper (and easier) way of doing it via vCenter?

The likely reason it is becoming partitioned when you add storage to any node is that (based on the default on-disk format) it is detecting the node should be using Unicast mode communication and you haven't configured unicast addresses so it thinks the other nodes are in Multicast mode:

VMware Knowledge Base

   Sub-Cluster Member Count: 1

   Unicast Mode Enabled: true

   vs

   Sub-Cluster Member Count: 2

   Unicast Mode Enabled: false

This is simply remediated by populating the unicastagent lists on each node and with -U 1 flag (Unicast=True) e.g. on srv1:

# esxcli vsan cluster unicastagent add -u 5f44f8e2-02bd-46f4-1db0-9440c92e6e78 -a <vSANvmkIPOfsrv2> -U 1

# esxcli vsan cluster unicastagent add -u 5f44fa76-a1d9-efe6-d13d-9440c931d754 -a <vSANvmkIPOfsrv3> -U 1

VMware Knowledge Base

And do similar on the other 2 nodes - NOTE: each node should only have the unicastagent listings for all the OTHER nodes in the cluster and never itself.

Or you could just create the cluster via vCenter - if your reason for not doing this is that you want to place the vCenter on vsanDatastore then this can be done with even just a single node and FTT=0 deployment.

Bob

View solution in original post

4 Replies
Commander
Commander

Hey jkjr​,

I think you are facing a vSAN partition issue. Could you please verify that you can reach all the hosts between them using vmkping and specifying the vSAN VMkernel_

0 Kudos
VMware Employee
VMware Employee

Hello jkjr​,

Welcome to Communities (and vSAN).

Lalegre, yes the cluster is partition but not due to network communication as OP already shown that the cluster was fully-formed (that wouldn't be possible if there was no network communication).

jkjr​, is there a specific reason you are trying to configure vSAN cluster via the CLI as opposed to the proper (and easier) way of doing it via vCenter?

The likely reason it is becoming partitioned when you add storage to any node is that (based on the default on-disk format) it is detecting the node should be using Unicast mode communication and you haven't configured unicast addresses so it thinks the other nodes are in Multicast mode:

VMware Knowledge Base

   Sub-Cluster Member Count: 1

   Unicast Mode Enabled: true

   vs

   Sub-Cluster Member Count: 2

   Unicast Mode Enabled: false

This is simply remediated by populating the unicastagent lists on each node and with -U 1 flag (Unicast=True) e.g. on srv1:

# esxcli vsan cluster unicastagent add -u 5f44f8e2-02bd-46f4-1db0-9440c92e6e78 -a <vSANvmkIPOfsrv2> -U 1

# esxcli vsan cluster unicastagent add -u 5f44fa76-a1d9-efe6-d13d-9440c931d754 -a <vSANvmkIPOfsrv3> -U 1

VMware Knowledge Base

And do similar on the other 2 nodes - NOTE: each node should only have the unicastagent listings for all the OTHER nodes in the cluster and never itself.

Or you could just create the cluster via vCenter - if your reason for not doing this is that you want to place the vCenter on vsanDatastore then this can be done with even just a single node and FTT=0 deployment.

Bob

View solution in original post

Contributor
Contributor

Thanx TheBobkin​, population of the unicastlist was indeed the solution.

Addition to the command was the -t node, without this gave error:

vSAN does not support connecting to more than one unicast witness, and there is already a unicast witness configured

So full solution of the problem was (example for host1):

esxcli vsan cluster unicastagent add -u 5f44f8e2-02bd-46f4-1db0-9440c92e6e78 -a 192.168.100.2 -U 1 -t node

esxcli vsan cluster unicastagent add -u 5f44fa76-a1d9-efe6-d13d-9440c931d754 -a 192.168.100.3 -U 1 -t node

0 Kudos
VMware Employee
VMware Employee

Hello jkjr​,

Yes, sorry I have a bad habit of forgetting the -t switch (until I look at the Witness=1 flag after on the list...shake my head in shame, sigh, remove the entry and add it correctly :smileyangry: )

Glad could help you get it sorted though.

Bob