When the number of Stretched cluster host to 8 when the network partition will occur.
All Nested Host is ESXi 6.5 GA Bulid 5310538(VSAN 6.6)
I deploy 7 Nested ESXi hosts and then make it 3 + 3 + 1 vSAN Stretched Cluster(Site-A : 3 Hosts , Site-B : 3 Hosts and 1 for Witness Appliance),No network partition problems occurred.
Host yellow exclamation mark is because I enabled SSH
Once the VSAN cluster is 8 hosts(Site-A : 4 Hosts , Site-B : 4 Hosts and 1 for Witness Appliance), there will be a network partition problem.
About LAB Environmental Description:
Site A Hosts
esxi01-a.corp.demo.com
MGMT VMK=192.168.50.1/24 GW=192.168.50.254
VSAN VMK=172.30.0.11/24 GW=172.30.0.1
esxi02-a.corp.demo.com
MGMT VMK=192.168.50.2/24 GW=192.168.50.254
VSAN VMK=172.30.0.12/24 GW=172.30.0.1
esxi03-a.corp.demo.com
MGMT VMK=192.168.50.3/24 GW=192.168.50.254
VSAN VMK=172.30.0.13/24 GW=172.30.0.1
esxi04-a.corp.demo.com
MGMT VMK=192.168.50.4/24 GW=192.168.50.254
VSAN VMK=172.30.0.14/24 GW=172.30.0.1
Before Enabling VSAN, configure the Gateway commands for the esxi01-a to esxi04-a host:
esxcli network ip route ipv4 add -n 147.80.0.0/24 -g 172.30.0.1
Site B Hosts
esxi01-b.corp.demo.com
MGMT VMK=192.168.50.6/24 GW=192.168.50.254
VSAN VMK=172.30.0.21/24 GW=172.30.0.254
esxi02-b.corp.demo.com
MGMT VMK=192.168.50.7/24 GW=192.168.50.254
VSAN VMK=172.30.0.22/24 GW=172.30.0.254
esxi03-b.corp.demo.com
MGMT VMK=192.168.50.8/24 GW=192.168.50.254
VSAN VMK=172.30.0.23/24 GW=172.30.0.254
esxi04-b.corp.demo.com
MGMT VMK=192.168.50.9/24 GW=192.168.50.254
VSAN VMK=172.30.0.24/24 GW=172.30.0.254
Before Enabling VSAN, configure the Gateway commands for the esxi01-b to esxi04-b host:
esxcli network ip route ipv4 add -n 147.80.0.0/24 -g 172.30.0.254
Site C Hosts
witness.corp.demo.com
MGMT VMK=192.168.50.250/24 GW=192.168.50.254
VSAN VMK=147.80.0.15/24 GW=147.80.0.1 or 147.80.0.254
Before Enabling VSAN, configure the Gateway commands for the witness host:
esxcli network ip route ipv4 add -n 172.30.0.11/32 -g 147.80.0.1
esxcli network ip route ipv4 add -n 172.30.0.12/32 -g 147.80.0.1
esxcli network ip route ipv4 add -n 172.30.0.13/32 -g 147.80.0.1
esxcli network ip route ipv4 add -n 172.30.0.14/32 -g 147.80.0.1
esxcli network ip route ipv4 add -n 172.30.0.21/32 -g 147.80.0.254
esxcli network ip route ipv4 add -n 172.30.0.22/32 -g 147.80.0.254
esxcli network ip route ipv4 add -n 172.30.0.23/32 -g 147.80.0.254
esxcli network ip route ipv4 add -n 172.30.0.24/32 -g 147.80.0.254
That's My lab Topology
I use 4 Linux virtual machines as a static router and use 3 vSwitches.
I tested on the witness host as follows:
Trace Site-A All Host's VSAN VMKs(Will go 147.80.0.1 Gateway)
[root@witness:~] traceroute -i vmk1 172.30.0.11
traceroute to 172.30.0.11 (172.30.0.11), 30 hops max, 40 byte packets
1 147.80.0.1 (147.80.0.1) 0.174 ms 0.128 ms 0.154 ms
2 10.10.0.1 (10.10.0.1) 0.275 ms 0.241 ms 0.162 ms
3 172.30.0.11 (172.30.0.11) 0.420 ms 0.549 ms 0.280 ms
[root@witness:~] traceroute -i vmk1 172.30.0.12
traceroute to 172.30.0.12 (172.30.0.12), 30 hops max, 40 byte packets
1 147.80.0.1 (147.80.0.1) 0.129 ms 0.077 ms 0.102 ms
2 10.10.0.1 (10.10.0.1) 0.181 ms 0.100 ms 0.094 ms
3 172.30.0.12 (172.30.0.12) 0.324 ms 0.391 ms 0.343 ms
[root@witness:~] traceroute -i vmk1 172.30.0.13
traceroute to 172.30.0.13 (172.30.0.13), 30 hops max, 40 byte packets
1 * 147.80.0.1 (147.80.0.1) 0.250 ms 0.149 ms
2 10.10.0.1 (10.10.0.1) 0.166 ms 0.163 ms 0.124 ms
3 172.30.0.13 (172.30.0.13) 0.577 ms 0.808 ms 0.344 ms
[root@witness:~] traceroute -i vmk1 172.30.0.14
traceroute to 172.30.0.14 (172.30.0.14), 30 hops max, 40 byte packets
1 147.80.0.1 (147.80.0.1) 0.179 ms 0.068 ms 0.149 ms
2 10.10.0.1 (10.10.0.1) 0.168 ms 0.199 ms *
3 172.30.0.14 (172.30.0.14) 0.493 ms 0.378 ms 0.310 ms
Trace Site-B All Host's VSAN VMKs(Will go 147.80.0.254 Gateway)
[root@witness:~] traceroute -i vmk1 172.30.0.21
traceroute to 172.30.0.21 (172.30.0.21), 30 hops max, 40 byte packets
1 147.80.0.254 (147.80.0.254) 0.197 ms 0.151 ms 0.075 ms
2 10.10.0.254 (10.10.0.254) 0.241 ms 0.248 ms 0.147 ms
3 172.30.0.21 (172.30.0.21) 0.433 ms 0.489 ms 0.305 ms
[root@witness:~] traceroute -i vmk1 172.30.0.22
traceroute to 172.30.0.22 (172.30.0.22), 30 hops max, 40 byte packets
1 147.80.0.254 (147.80.0.254) 0.129 ms 0.118 ms 0.080 ms
2 10.10.0.254 (10.10.0.254) 0.163 ms 0.130 ms 0.119 ms
3 172.30.0.22 (172.30.0.22) 0.388 ms 0.467 ms 0.323 ms
[root@witness:~] traceroute -i vmk1 172.30.0.23
traceroute to 172.30.0.23 (172.30.0.23), 30 hops max, 40 byte packets
1 * 147.80.0.254 (147.80.0.254) 0.200 ms 0.075 ms
2 10.10.0.254 (10.10.0.254) 0.194 ms 0.170 ms 0.131 ms
3 172.30.0.23 (172.30.0.23) 0.359 ms 0.623 ms 0.420 ms
[root@witness:~] traceroute -i vmk1 172.30.0.24
traceroute to 172.30.0.24 (172.30.0.24), 30 hops max, 40 byte packets
1 147.80.0.254 (147.80.0.254) 0.227 ms 0.163 ms 0.151 ms
2 10.10.0.254 (10.10.0.254) 0.203 ms 0.185 ms 0.190 ms
3 172.30.0.24 (172.30.0.24) 0.384 ms 0.363 ms 0.291 ms
[root@witness:~]
I do not know how to solve this problem of network partitioning?
If it can not be achieved, I will not be able to test the VSAN RAID-5/6 strategy.
Thank you for all the help
During network partition, components in the active site appear to be absent.
During a network partition in a vSAN 2 host or stretched cluster, the vSphere Web Client might display a view of the cluster from the perspective of the non-active site. You might see active components in the primary site displayed as absent.
Workaround: Use RVC commands to query the state of objects in the cluster.
I don't understand why esxi04-b host became master role
/localhost/Datacenter/computers> vsan.cluster_info 0
2017-07-15 00:45:58 +0800: Fetching host info from esxi01-a.corp.demo.com (may take a moment) ...
2017-07-15 00:45:58 +0800: Fetching host info from esxi02-b.corp.demo.com (may take a moment) ...
2017-07-15 00:45:58 +0800: Fetching host info from esxi03-b.corp.demo.com (may take a moment) ...
2017-07-15 00:45:58 +0800: Fetching host info from esxi04-b.corp.demo.com (may take a moment) ...
2017-07-15 00:45:58 +0800: Fetching host info from esxi03-a.corp.demo.com (may take a moment) ...
2017-07-15 00:45:58 +0800: Fetching host info from esxi01-b.corp.demo.com (may take a moment) ...
2017-07-15 00:45:58 +0800: Fetching host info from witness.corp.demo.com (may take a moment) ...
2017-07-15 00:45:58 +0800: Fetching host info from esxi04-a.corp.demo.com (may take a moment) ...
2017-07-15 00:45:58 +0800: Fetching host info from esxi02-a.corp.demo.com (may take a moment) ...
Host: esxi01-a.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: master
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302bf-1a29-4839-97e4-000c29c1488d
Member UUIDs: ["596302c3-1d16-741b-cf50-000c2902b1ce", "596302ce-5c6c-2e35-df9b-000c299b5777", "596302bf-1a29-4839-97e4-000c29c1488d", "596302cb-5fd0-745f-9d9b-000c299654c0", "596302c9-137c-5bca-e942-000c2992f9eb", "596302ce-7cb1-b0fd-3d36-000c29
ef1351", "596302d5-90f8-92e6-be9b-000c296de5aa", "596302ce-dfe2-ddb3-1768-000c29ddb5e5"] (8)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Preferred
NetworkInfo:
Adapter: vmk1 (172.30.0.11)
Host: esxi02-a.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: agent
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302c9-137c-5bca-e942-000c2992f9eb
Member UUIDs: ["596302c3-1d16-741b-cf50-000c2902b1ce", "596302ce-5c6c-2e35-df9b-000c299b5777", "596302bf-1a29-4839-97e4-000c29c1488d", "596302cb-5fd0-745f-9d9b-000c299654c0", "596302c9-137c-5bca-e942-000c2992f9eb", "596302ce-7cb1-b0fd-3d36-000c29
ef1351", "596302d5-90f8-92e6-be9b-000c296de5aa", "596302ce-dfe2-ddb3-1768-000c29ddb5e5"] (8)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Preferred
NetworkInfo:
Adapter: vmk1 (172.30.0.12)
Host: esxi03-a.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: agent
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302ce-5c6c-2e35-df9b-000c299b5777
Member UUIDs: ["596302c3-1d16-741b-cf50-000c2902b1ce", "596302ce-5c6c-2e35-df9b-000c299b5777", "596302bf-1a29-4839-97e4-000c29c1488d", "596302cb-5fd0-745f-9d9b-000c299654c0", "596302c9-137c-5bca-e942-000c2992f9eb", "596302ce-7cb1-b0fd-3d36-000c29
ef1351", "596302d5-90f8-92e6-be9b-000c296de5aa", "596302ce-dfe2-ddb3-1768-000c29ddb5e5"] (8)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Preferred
NetworkInfo:
Adapter: vmk1 (172.30.0.13)
Host: esxi01-b.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: backup
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302c3-1d16-741b-cf50-000c2902b1ce
Member UUIDs: ["596302c3-1d16-741b-cf50-000c2902b1ce", "596302ce-5c6c-2e35-df9b-000c299b5777", "596302bf-1a29-4839-97e4-000c29c1488d", "596302cb-5fd0-745f-9d9b-000c299654c0", "596302c9-137c-5bca-e942-000c2992f9eb", "596302ce-7cb1-b0fd-3d36-000c29
ef1351", "596302d5-90f8-92e6-be9b-000c296de5aa", "596302ce-dfe2-ddb3-1768-000c29ddb5e5"] (8)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Secondary
NetworkInfo:
Adapter: vmk1 (172.30.0.21)
Host: esxi02-b.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: agent
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302cb-5fd0-745f-9d9b-000c299654c0
Member UUIDs: ["596302c3-1d16-741b-cf50-000c2902b1ce", "596302ce-5c6c-2e35-df9b-000c299b5777", "596302bf-1a29-4839-97e4-000c29c1488d", "596302cb-5fd0-745f-9d9b-000c299654c0", "596302c9-137c-5bca-e942-000c2992f9eb", "596302ce-7cb1-b0fd-3d36-000c29
ef1351", "596302d5-90f8-92e6-be9b-000c296de5aa", "596302ce-dfe2-ddb3-1768-000c29ddb5e5"] (8)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Secondary
NetworkInfo:
Adapter: vmk1 (172.30.0.22)
Host: esxi03-b.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: agent
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302ce-7cb1-b0fd-3d36-000c29ef1351
Member UUIDs: ["596302c3-1d16-741b-cf50-000c2902b1ce", "596302ce-5c6c-2e35-df9b-000c299b5777", "596302bf-1a29-4839-97e4-000c29c1488d", "596302cb-5fd0-745f-9d9b-000c299654c0", "596302c9-137c-5bca-e942-000c2992f9eb", "596302ce-7cb1-b0fd-3d36-000c29
ef1351", "596302d5-90f8-92e6-be9b-000c296de5aa", "596302ce-dfe2-ddb3-1768-000c29ddb5e5"] (8)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Secondary
NetworkInfo:
Adapter: vmk1 (172.30.0.23)
Host: witness.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: agent
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302d5-90f8-92e6-be9b-000c296de5aa
Node Type: Witness
Member UUIDs: ["596302c3-1d16-741b-cf50-000c2902b1ce", "596302ce-5c6c-2e35-df9b-000c299b5777", "596302bf-1a29-4839-97e4-000c29c1488d", "596302cb-5fd0-745f-9d9b-000c299654c0", "596302c9-137c-5bca-e942-000c2992f9eb", "596302ce-7cb1-b0fd-3d36-000c29
ef1351", "596302d5-90f8-92e6-be9b-000c296de5aa", "596302ce-dfe2-ddb3-1768-000c29ddb5e5"] (8)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Not configured
NetworkInfo:
Adapter: vmk1 (147.80.0.10)
Host: esxi04-a.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: agent
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302ce-dfe2-ddb3-1768-000c29ddb5e5
Member UUIDs: ["596302c3-1d16-741b-cf50-000c2902b1ce", "596302ce-5c6c-2e35-df9b-000c299b5777", "596302bf-1a29-4839-97e4-000c29c1488d", "596302cb-5fd0-745f-9d9b-000c299654c0", "596302c9-137c-5bca-e942-000c2992f9eb", "596302ce-7cb1-b0fd-3d36-000c29
ef1351", "596302d5-90f8-92e6-be9b-000c296de5aa", "596302ce-dfe2-ddb3-1768-000c29ddb5e5"] (8)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Not configured
NetworkInfo:
Adapter: vmk1 (172.30.0.14)
Host: esxi04-b.corp.demo.com
Product: VMware ESXi 6.5.0 build-5310538
vSAN enabled: yes
Cluster info:
Cluster role: master
Cluster UUID: 52576d7e-8094-d951-745b-885a36c778dd
Node UUID: 596302ce-5a99-6494-aa70-000c29302a2c
Member UUIDs: ["596302ce-5a99-6494-aa70-000c29302a2c"] (1)
Node evacuated: no
Storage info:
Auto claim: no
Disk Mappings:
SSD: Local VMware Disk (mpx.vmhba1:C0:T1:L0) - 40 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T2:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T4:L0) - 60 GB, v5
MD: Local VMware Disk (mpx.vmhba1:C0:T3:L0) - 60 GB, v5
FaultDomainInfo:
Not configured
NetworkInfo:
Adapter: vmk1 (172.30.0.24)
Cluster has fault domains configured:
Preferred: esxi01-a.corp.demo.com, esxi02-a.corp.demo.com, esxi03-a.corp.demo.com
Secondary: esxi01-b.corp.demo.com, esxi02-b.corp.demo.com, esxi03-b.corp.demo.com
Not Configured: esxi04-a.corp.demo.com, esxi04-b.corp.demo.com
Preferred fault domain: Preferred
Preferred fault domain UUID: a054ccb4-ff68-4c73-cbc2-d272d45e32df
/localhost/Datacenter/computers>
Hello YanSi,
It is Master because it is isolated from the rest of the cluster and thus it is the only cluster-member that it knows of.
Can you show the output of 'esxcli vsan cluster get' on a host from the main partition and the isolated one?
Also show the output from 'esxcli vsan cluster unicastagent list' from both.
If the unicastagent list is blank or has null entries then you should just need to add the unicastagent addresses for all the other nodes on the isolated node:
e.g. esxcli vsan cluster unicastagent add -i vmk1 -a 172.30.0.11 -t node -U 1
(sometimes also need to specify -u <Local node UUID> of the other hosts too)
http://pubs.vmware.com/vsphere-6-0/index.jsp?topic=%2Fcom.vmware.vcli.ref.doc%2Fesxcli_vsan.html
Let me know if this is the case or if something else is at fault here.
Bob
Hi Bob
main partition host show:
[root@esxi01-a:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2017-07-16T16:27:00Z
Local Node UUID: 596302bf-1a29-4839-97e4-000c29c1488d
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 596302bf-1a29-4839-97e4-000c29c1488d
Sub-Cluster Backup UUID: 596302c3-1d16-741b-cf50-000c2902b1ce
Sub-Cluster UUID: 5272ac4f-8b03-fe07-8f56-813ebab4d8c5
Sub-Cluster Membership Entry Revision: 7
Sub-Cluster Member Count: 8
Sub-Cluster Member UUIDs: 596302c3-1d16-741b-cf50-000c2902b1ce, 596302bf-1a29-4839-97e4-000c29c1488d, 596302cb-5fd0-745f-9d9b-000c299654c0, 596302ce-5a99-6494-aa70-000c29302a2c, 596302ce-dfe2-ddb3-1768-000c29ddb5e5, 596302c9-137c-5bca-e942-000c2992f9eb, 596302ce-7cb1-b0fd-3d36-000c29ef1351, 596302d5-90f8-92e6-be9b-000c296de5aa
Sub-Cluster Membership UUID: a68c6b59-0e79-f9a6-bbc0-000c29c1488d
Unicast Mode Enabled: true
Maintenance Mode State: OFF
[root@esxi01-a:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- ----------- ----- ----------
00000000-0000-0000-0000-000000000000 1 true 147.80.0.10 12321
596302ce-5c6c-2e35-df9b-000c299b5777 0 true 172.30.0.13 12321
596302c9-137c-5bca-e942-000c2992f9eb 0 true 172.30.0.12 12321
596302cb-5fd0-745f-9d9b-000c299654c0 0 true 172.30.0.22 12321
596302c3-1d16-741b-cf50-000c2902b1ce 0 true 172.30.0.21 12321
596302ce-5a99-6494-aa70-000c29302a2c 0 true 172.30.0.24 12321
596302ce-dfe2-ddb3-1768-000c29ddb5e5 0 true 172.30.0.14 12321
596302ce-7cb1-b0fd-3d36-000c29ef1351 0 true 172.30.0.23 12321
[root@esxi01-a:~]
isolated partition host show:
[root@esxi03-a:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2017-07-16T16:27:04Z
Local Node UUID: 596302ce-5c6c-2e35-df9b-000c299b5777
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 596302ce-5c6c-2e35-df9b-000c299b5777
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 5272ac4f-8b03-fe07-8f56-813ebab4d8c5
Sub-Cluster Membership Entry Revision: 2
Sub-Cluster Member Count: 1
Sub-Cluster Member UUIDs: 596302ce-5c6c-2e35-df9b-000c299b5777
Sub-Cluster Membership UUID: 69936b59-fd9b-5775-475d-000c299b5777
Unicast Mode Enabled: true
Maintenance Mode State: OFF
[root@esxi03-a:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- ----------- ----- ----------
00000000-0000-0000-0000-000000000000 1 true 147.80.0.10 12321
596302c3-1d16-741b-cf50-000c2902b1ce 0 true 172.30.0.21 12321
596302c9-137c-5bca-e942-000c2992f9eb 0 true 172.30.0.12 12321
596302cb-5fd0-745f-9d9b-000c299654c0 0 true 172.30.0.22 12321
596302ce-5a99-6494-aa70-000c29302a2c 0 true 172.30.0.24 12321
596302ce-dfe2-ddb3-1768-000c29ddb5e5 0 true 172.30.0.14 12321
596302ce-7cb1-b0fd-3d36-000c29ef1351 0 true 172.30.0.23 12321
[root@esxi03-a:~]
I do not understand how to fix it?
[root@esxi03-a:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- ----------- ----- ----------
00000000-0000-0000-0000-000000000000 1 true 147.80.0.10 12321
596302c3-1d16-741b-cf50-000c2902b1ce 0 true 172.30.0.21 12321
596302c9-137c-5bca-e942-000c2992f9eb 0 true 172.30.0.12 12321
596302cb-5fd0-745f-9d9b-000c299654c0 0 true 172.30.0.22 12321
596302ce-5a99-6494-aa70-000c29302a2c 0 true 172.30.0.24 12321
596302ce-dfe2-ddb3-1768-000c29ddb5e5 0 true 172.30.0.14 12321
596302ce-7cb1-b0fd-3d36-000c29ef1351 0 true 172.30.0.23 12321
[root@esxi03-a:~] esxcli vsan cluster unicastagent add -a 172.30.0.11 -t node -U 0 -u 596302bf-1a29-4839-97e4-000c29c
1488d
[root@esxi03-a:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- ----------- ----- ----------
00000000-0000-0000-0000-000000000000 1 true 147.80.0.10 12321
596302c3-1d16-741b-cf50-000c2902b1ce 0 true 172.30.0.21 12321
596302c9-137c-5bca-e942-000c2992f9eb 0 true 172.30.0.12 12321
596302cb-5fd0-745f-9d9b-000c299654c0 0 true 172.30.0.22 12321
596302ce-5a99-6494-aa70-000c29302a2c 0 true 172.30.0.24 12321
596302ce-dfe2-ddb3-1768-000c29ddb5e5 0 true 172.30.0.14 12321
596302ce-7cb1-b0fd-3d36-000c29ef1351 0 true 172.30.0.23 12321
596302bf-1a29-4839-97e4-000c29c1488d 0 false 172.30.0.11 12321
[root@esxi03-a:~]
What does it mean?
Thank your help
Hello,
This is esxi03-a, it was esxi04-b that was isolated before, did you get that host back in cluster normally?
Anyway, so you added that esxi01a (172.30.0.11) incorrectly, I see you specified -U 0 which means Unicast = false,
remove the current entry for 172.30.0.11 and add it back but with -U 1 (which specifies Unicast = true).
Bob
[root@esxi03-a:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2017-07-16T19:47:26Z
Local Node UUID: 596302ce-5c6c-2e35-df9b-000c299b5777
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 596302ce-5c6c-2e35-df9b-000c299b5777
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 52b0d964-ca50-1969-02ca-0e527e56508b
Sub-Cluster Membership Entry Revision: 2
Sub-Cluster Member Count: 1
Sub-Cluster Member UUIDs: 596302ce-5c6c-2e35-df9b-000c299b5777
Sub-Cluster Membership UUID: f2c16b59-b9a0-1d63-05e2-000c299b5777
Unicast Mode Enabled: true
Maintenance Mode State: OFF
[root@esxi03-a:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- ----------- ----- ----------
00000000-0000-0000-0000-000000000000 1 true 147.80.0.10 12321
596302c9-137c-5bca-e942-000c2992f9eb 0 true 172.30.0.12 12321
596302c3-1d16-741b-cf50-000c2902b1ce 0 true 172.30.0.21 12321
596302ce-dfe2-ddb3-1768-000c29ddb5e5 0 true 172.30.0.14 12321
596302cb-5fd0-745f-9d9b-000c299654c0 0 true 172.30.0.22 12321
596302ce-7cb1-b0fd-3d36-000c29ef1351 0 true 172.30.0.23 12321
596302ce-5a99-6494-aa70-000c29302a2c 0 true 172.30.0.24 12321
596302bf-1a29-4839-97e4-000c29c1488d 0 true 172.30.0.11 12321
596302ce-5c6c-2e35-df9b-000c299b5777 0 true 172.30.0.13 12321
[root@esxi03-a:~]
but role not change
Hello,
Hosts do not need an entry for local node (themself) in unicastagent list so remove this, then try leave and join, looks like you have wrong Sub-cluster UUID:
# esxcli vsan cluster leave
# esxcli vsan cluster join -u 5272ac4f-8b03-fe07-8f56-813ebab4d8c5 (double check that this is the current Sub-cluster UUID of the non-isolated hosts)
Bob
I think that is caused by the communication mechanism of the district, once joined the first eight hosts, will randomly cause a host is isolated.
Perhaps the text is not intuitive, I will show the screenshot.
this screenshot is show 7 hosts, there is no network partition problem.
When adding an eighth host, a network partitioning problem occurs.
esxi03-a this time is main partition master host.
esxi04-b this time is isolated partition master host.
Sub-Cluster UUID is the same.
Sub-Cluster UUID: 52dcf5b7-fb2c-803b-9af9-cbf8e900cc4f
So that the issue is very strange
I believe I have the same issue.
Just configured a stretched cluster composed of 8 hosts(4 per site) + 1 witness.
One host keeps getting partitioned.
If I remove any host from the cluster it is fine, but as soon as I add it back so it becomes 8 then it fails.
I also noticed that the Sub-Cluster UUID is the same in the cluster as well as on the partitioned host.
Seems very strange.
Update:
Just noticed that 8 seems to be the magical number...
Because when its configured as stretched, then the witness is of course a member.
So "esxcli vsan cluster get" shows 8 members. Its when I add the 9th that it breaks.
I tried recreating the cluster as a regular one without fault domains, and without the witness.
I could add 8 regular hosts, but when I add the 9th again it becomes partitioned...
Strange, rolling out a vSAN 6.6 lab myself to see what happens when I do this with 10 hosts
Do you see any dropped packets in your network?
Aside from mis-configurations, you can have network partitioning if the network is overloaded. vSAN only tolerates a small number of dropped packets within the vSAN network.
You can use esxtop>n (network view)> look for %DRPRX
I just tested it with vSAN 6.6 (vSphere 6.5 U1), worked fine for me going from 8 nodes to 10. I would recommend to phone up support and let them have a look.
I have exactly the same issue. Cannot configure more than 8 vSAN members inside my cluster.
I have increased the RAM from 6 to 8GB, tested it with encryption and Dedupe Compression, nothing works.
@Duncan
How is your configuration different from ours?
Problem is fixed. Each Nested ESXi host needs 16GB RAM.....
I have configured my Stretched 15+15+1 nested cluster including Dedupe/Compression and Encryption