I have 3 ESXi 6.7.0u3 hosts and trying to make vsan cluster. After configuring networking, and start setting up cluster on first host:
esxcli vsan cluster new
result:
[root@srv1:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2020-09-02T11:46:22Z
Local Node UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 52cffc95-a360-d292-7279-b6304d366ae5
Sub-Cluster Membership Entry Revision: 0
Sub-Cluster Member Count: 1
on the other 2 hosts:
esxcli vsan cluster join -u 52cffc95-a360-d292-7279-b6304d366ae5
result cluster has 3 members:
[root@srv1:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2020-09-02T11:46:49Z
Local Node UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8
Sub-Cluster Backup UUID: 5f44f8e2-02bd-46f4-1db0-9440c92e6e78
Sub-Cluster UUID: 52cffc95-a360-d292-7279-b6304d366ae5
Sub-Cluster Membership Entry Revision: 2
Sub-Cluster Member Count: 3
Sub-Cluster Member UUIDs: 5f4630de-4f35-4760-fd8f-9440c92e6eb8, 5f44f8e2-02bd-46f4-1db0-9440c92e6e78, 5f44fa76-a1d9-efe6-d13d-9440c931d754
As soon as I add storage, it doesn't matter on which of the 3 hosts, the host is not in the cluster anymore:
[root@srv1:~] esxcli vsan storage add -s mpx.vmhba0:C0:T64:L0 -d mpx.vmhba0:C0:T66:L0 -d mpx.vmhba0:C0:T67:L0 -d mpx.vmhba0:C0:T68:L0 -d mpx.vmhba0:C0:T69:L0
[root@srv1:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2020-09-02T11:49:22Z
Local Node UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 5f4630de-4f35-4760-fd8f-9440c92e6eb8
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 52cffc95-a360-d292-7279-b6304d366ae5
Sub-Cluster Membership Entry Revision: 0
Sub-Cluster Member Count: 1
Sub-Cluster Member UUIDs: 5f4630de-4f35-4760-fd8f-9440c92e6eb8
Sub-Cluster Member HostNames: srv1
Sub-Cluster Membership UUID: bb864f5f-20eb-225c-f795-9440c92e6eb8
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: None 0 0.0
Status of the other 2 remaining hosts:
[root@srv2:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2020-09-02T12:01:39Z
Local Node UUID: 5f44f8e2-02bd-46f4-1db0-9440c92e6e78
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 5f44f8e2-02bd-46f4-1db0-9440c92e6e78
Sub-Cluster Backup UUID: 5f44fa76-a1d9-efe6-d13d-9440c931d754
Sub-Cluster UUID: 52cffc95-a360-d292-7279-b6304d366ae5
Sub-Cluster Membership Entry Revision: 2
Sub-Cluster Member Count: 2
Sub-Cluster Member UUIDs: 5f44f8e2-02bd-46f4-1db0-9440c92e6e78, 5f44fa76-a1d9-efe6-d13d-9440c931d754
Sub-Cluster Member HostNames: srv2, srv3
Sub-Cluster Membership UUID: 26864f5f-a0bd-8377-dbbb-9440c92e6e78
Unicast Mode Enabled: false
Maintenance Mode State: OFF
Config Generation: None 0 0.0
Any clues what causes this, and how to solve?
Hello jkjr,
Welcome to Communities (and vSAN).
Lalegre, yes the cluster is partition but not due to network communication as OP already shown that the cluster was fully-formed (that wouldn't be possible if there was no network communication).
jkjr, is there a specific reason you are trying to configure vSAN cluster via the CLI as opposed to the proper (and easier) way of doing it via vCenter?
The likely reason it is becoming partitioned when you add storage to any node is that (based on the default on-disk format) it is detecting the node should be using Unicast mode communication and you haven't configured unicast addresses so it thinks the other nodes are in Multicast mode:
Sub-Cluster Member Count: 1
Unicast Mode Enabled: true
vs
Sub-Cluster Member Count: 2
Unicast Mode Enabled: false
This is simply remediated by populating the unicastagent lists on each node and with -U 1 flag (Unicast=True) e.g. on srv1:
# esxcli vsan cluster unicastagent add -u 5f44f8e2-02bd-46f4-1db0-9440c92e6e78 -a <vSANvmkIPOfsrv2> -U 1
# esxcli vsan cluster unicastagent add -u 5f44fa76-a1d9-efe6-d13d-9440c931d754 -a <vSANvmkIPOfsrv3> -U 1
And do similar on the other 2 nodes - NOTE: each node should only have the unicastagent listings for all the OTHER nodes in the cluster and never itself.
Or you could just create the cluster via vCenter - if your reason for not doing this is that you want to place the vCenter on vsanDatastore then this can be done with even just a single node and FTT=0 deployment.
Bob
Hey jkjr,
I think you are facing a vSAN partition issue. Could you please verify that you can reach all the hosts between them using vmkping and specifying the vSAN VMkernel_
Hello jkjr,
Welcome to Communities (and vSAN).
Lalegre, yes the cluster is partition but not due to network communication as OP already shown that the cluster was fully-formed (that wouldn't be possible if there was no network communication).
jkjr, is there a specific reason you are trying to configure vSAN cluster via the CLI as opposed to the proper (and easier) way of doing it via vCenter?
The likely reason it is becoming partitioned when you add storage to any node is that (based on the default on-disk format) it is detecting the node should be using Unicast mode communication and you haven't configured unicast addresses so it thinks the other nodes are in Multicast mode:
Sub-Cluster Member Count: 1
Unicast Mode Enabled: true
vs
Sub-Cluster Member Count: 2
Unicast Mode Enabled: false
This is simply remediated by populating the unicastagent lists on each node and with -U 1 flag (Unicast=True) e.g. on srv1:
# esxcli vsan cluster unicastagent add -u 5f44f8e2-02bd-46f4-1db0-9440c92e6e78 -a <vSANvmkIPOfsrv2> -U 1
# esxcli vsan cluster unicastagent add -u 5f44fa76-a1d9-efe6-d13d-9440c931d754 -a <vSANvmkIPOfsrv3> -U 1
And do similar on the other 2 nodes - NOTE: each node should only have the unicastagent listings for all the OTHER nodes in the cluster and never itself.
Or you could just create the cluster via vCenter - if your reason for not doing this is that you want to place the vCenter on vsanDatastore then this can be done with even just a single node and FTT=0 deployment.
Bob
Thanx TheBobkin, population of the unicastlist was indeed the solution.
Addition to the command was the -t node, without this gave error:
vSAN does not support connecting to more than one unicast witness, and there is already a unicast witness configured
So full solution of the problem was (example for host1):
esxcli vsan cluster unicastagent add -u 5f44f8e2-02bd-46f4-1db0-9440c92e6e78 -a 192.168.100.2 -U 1 -t node
esxcli vsan cluster unicastagent add -u 5f44fa76-a1d9-efe6-d13d-9440c931d754 -a 192.168.100.3 -U 1 -t node
Hello jkjr,
Yes, sorry I have a bad habit of forgetting the -t switch (until I look at the Witness=1 flag after on the list...shake my head in shame, sigh, remove the entry and add it correctly :smileyangry: )
Glad could help you get it sorted though.
Bob