Good afternoon, colleagues.
I have 2node direct connect cluster (nodes connected with direct-attach cable) and witness host, placed on another POP.
All nodes and VCSA have latest updates from 6.5 line.
Hypervisors and witness - VMware ESXi, 6.5.0, 11925212
VCSA - 6.5.0.23100
At some point in time, an error "Witness host not found" occurred.
I checked the connectivity via vmkping, all nodes see each other.
I tried to disable "stretched cluster", but with no luck.
And cluster work in "Multicast" network mode.
On "configuration assist" ping to witness is failed:
Dropbox - Screenshot 2019-03-05 10.10.51.png
But from ssh shell, all looks good:
[root@esx1:~] vmkping -I vmk0 w.energytel.net.ua
PING w.energytel.net.ua (213.133.160.226): 56 data bytes
64 bytes from 213.133.160.226: icmp_seq=0 ttl=58 time=13.825 ms
64 bytes from 213.133.160.226: icmp_seq=1 ttl=58 time=13.867 ms
64 bytes from 213.133.160.226: icmp_seq=2 ttl=58 time=13.709 ms
Are there any ideas what could be the problem?
Thanks!
Hello Yokodzun,
[root@esx1:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- --------------- ----- ----------
5a8e8bc2-dc85-7220-6989-248a076c7550 0 true 10.10.10.20 12321
5a8e8bc2-dc85-7220-6989-248a076c7550 0 true 185.176.120.5 12321
00000000-0000-0000-0000-000000000000 1 true 213.133.160.226 12321
[root@esx2:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- --------------- ----- ----------
5a8e825d-973e-6374-3f40-248a076c7580 0 true 10.10.10.10 12321
5a8e825d-973e-6374-3f40-248a076c7580 0 true 185.176.120.4 12321
00000000-0000-0000-0000-000000000000 1 true 213.133.160.226 12321
You appear to have extraneous unicastagent entries there - in a 1+1+1 cluster you should have exactly 2 unicast entries on each host: namely the other data-node + witness.
Even if you had multiple VMKs configured for vSAN traffic only one should be used for unicastagent communication so please identify which one you should be using (10.10.10.x or 185.176.120.x)and remove the other one.
Another point would be to ensure you have traffic types configured correctly - data-nodes should have 'vsan' for the inter-node vSAN traffic and 'witness' to the witness (if using Witness Traffic Seperation), witness should have only type 'vsan', this should show these:
#esxcli vsan network list
Bob
Hello Yokodzun
Can you check this from the health UI and retest? Cluster > Monitor > Health > Network test
If it cannot communicate with the Witness then it will only show 2 nodes in membership of #esxcli vsan cluster get.
If it is truly partitioned then I would advise checking the unicast traffic on port 12321 to and from the Witness vSAN address:
# nc -uz <DestIP> 12321
Check the unicastagent lists on the data-nodes contains the correct address,membership type (witness=1) and UUID (may be all 0's in some builds is normal) for the witness:
#esxcli vsan cluster unicastagent list
Bob
Can you check this from the health UI and retest? Cluster > Monitor > Health > Network test
An error is reproduced after reset:
Dropbox - Screenshot 2019-03-05 13.50.09.png
If it is truly partitioned then I would advise checking the unicast traffic on port 12321 to and from the Witness vSAN address:
From ssh shell, all looks good. From witness to data nodes:
[root@w:~] nc -uz esx1.energytel.net.ua 12321
Connection to esx1.energytel.net.ua 12321 port [udp/*] succeeded!
[root@w:~] nc -uz esx2.energytel.net.ua 12321
Connection to esx2.energytel.net.ua 12321 port [udp/*] succeeded!
[root@w:~]
From the data node to witness:
[root@esx1:~] nc -uz w.energytel.net.ua 12321
Connection to w.energytel.net.ua 12321 port [udp/*] succeeded!
[root@esx2:~] nc -uz w.energytel.net.ua 12321
Connection to w.energytel.net.ua 12321 port [udp/*] succeeded!
All data node contains info about other nodes and witness:
[root@esx1:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- --------------- ----- ----------
5a8e8bc2-dc85-7220-6989-248a076c7550 0 true 10.10.10.20 12321
5a8e8bc2-dc85-7220-6989-248a076c7550 0 true 185.176.120.5 12321
00000000-0000-0000-0000-000000000000 1 true 213.133.160.226 12321
[root@esx2:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- --------------- ----- ----------
5a8e825d-973e-6374-3f40-248a076c7580 0 true 10.10.10.10 12321
5a8e825d-973e-6374-3f40-248a076c7580 0 true 185.176.120.4 12321
00000000-0000-0000-0000-000000000000 1 true 213.133.160.226 12321
Both nodes show that's they work on unicast mode:
[root@esx2:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2019-03-05T12:27:48Z
Local Node UUID: 5a8e8bc2-dc85-7220-6989-248a076c7550
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 5a8e8bc2-dc85-7220-6989-248a076c7550
Sub-Cluster Backup UUID: 5a8e825d-973e-6374-3f40-248a076c7580
Sub-Cluster UUID: 52707a2b-b6cd-1e15-7fff-6ab6d0dd3466
Sub-Cluster Membership Entry Revision: 7
Sub-Cluster Member Count: 2
Sub-Cluster Member UUIDs: 5a8e8bc2-dc85-7220-6989-248a076c7550, 5a8e825d-973e-6374-3f40-248a076c7580
Sub-Cluster Membership UUID: 0bc4765c-a062-7e00-d056-248a076c7550
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: f7abdef2-bf70-45d9-8498-3b24337d0f69 2 2019-03-04T17:01:40.991
But VCSA show "multicast". And fail network checks.
Hello Yokodzun,
[root@esx1:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- --------------- ----- ----------
5a8e8bc2-dc85-7220-6989-248a076c7550 0 true 10.10.10.20 12321
5a8e8bc2-dc85-7220-6989-248a076c7550 0 true 185.176.120.5 12321
00000000-0000-0000-0000-000000000000 1 true 213.133.160.226 12321
[root@esx2:~] esxcli vsan cluster unicastagent list
NodeUuid IsWitness Supports Unicast IP Address Port Iface Name
------------------------------------ --------- ---------------- --------------- ----- ----------
5a8e825d-973e-6374-3f40-248a076c7580 0 true 10.10.10.10 12321
5a8e825d-973e-6374-3f40-248a076c7580 0 true 185.176.120.4 12321
00000000-0000-0000-0000-000000000000 1 true 213.133.160.226 12321
You appear to have extraneous unicastagent entries there - in a 1+1+1 cluster you should have exactly 2 unicast entries on each host: namely the other data-node + witness.
Even if you had multiple VMKs configured for vSAN traffic only one should be used for unicastagent communication so please identify which one you should be using (10.10.10.x or 185.176.120.x)and remove the other one.
Another point would be to ensure you have traffic types configured correctly - data-nodes should have 'vsan' for the inter-node vSAN traffic and 'witness' to the witness (if using Witness Traffic Seperation), witness should have only type 'vsan', this should show these:
#esxcli vsan network list
Bob
Another point would be to ensure you have traffic types configured correctly - data-nodes should have 'vsan' for the inter-node vSAN traffic and 'witness' to the witness (if using Witness Traffic Seperation), witness should have only type 'vsan', this should show these:
Thank you, that was the cause of my problem. I removed the unnecessary types of traffic from the ports, and the problem went away.