Yokodzun
Enthusiast
Enthusiast

Witness host not found on 2node cluster.

Jump to solution

Good afternoon, colleagues.

I have 2node direct connect cluster (nodes connected with direct-attach cable) and witness host, placed on another POP.

All nodes and VCSA have latest updates from 6.5 line.

Hypervisors and witness - VMware ESXi, 6.5.0, 11925212

VCSA - 6.5.0.23100

At some point in time, an error "Witness host not found" occurred.

I checked the connectivity via vmkping, all nodes see each other.

I tried to disable "stretched cluster", but with no luck.

And cluster work in "Multicast" network mode. 

On "configuration assist" ping to witness is failed:

Dropbox - Screenshot 2019-03-05 10.10.51.png

But from ssh shell, all looks good:

[root@esx1:~] vmkping -I vmk0 w.energytel.net.ua

PING w.energytel.net.ua (213.133.160.226): 56 data bytes

64 bytes from 213.133.160.226: icmp_seq=0 ttl=58 time=13.825 ms

64 bytes from 213.133.160.226: icmp_seq=1 ttl=58 time=13.867 ms

64 bytes from 213.133.160.226: icmp_seq=2 ttl=58 time=13.709 ms

Are there any ideas what could be the problem?

Thanks!

1 Solution

Accepted Solutions
TheBobkin
VMware Employee
VMware Employee

Hello Yokodzun​,

[root@esx1:~] esxcli vsan cluster unicastagent list 

NodeUuid                              IsWitness  Supports Unicast  IP Address        Port  Iface Name 

------------------------------------  ---------  ----------------  ---------------  -----  ---------- 

5a8e8bc2-dc85-7220-6989-248a076c7550          0              true  10.10.10.20      12321             

5a8e8bc2-dc85-7220-6989-248a076c7550          0              true  185.176.120.5    12321             

00000000-0000-0000-0000-000000000000          1              true  213.133.160.226  12321             

 

[root@esx2:~] esxcli vsan cluster unicastagent list 

NodeUuid                              IsWitness  Supports Unicast  IP Address        Port  Iface Name 

------------------------------------  ---------  ----------------  ---------------  -----  ---------- 

5a8e825d-973e-6374-3f40-248a076c7580          0              true  10.10.10.10      12321             

5a8e825d-973e-6374-3f40-248a076c7580          0              true  185.176.120.4    12321             

00000000-0000-0000-0000-000000000000          1              true  213.133.160.226  12321             

You appear to have extraneous unicastagent entries there - in a 1+1+1 cluster you should have exactly 2 unicast entries on each host: namely the other data-node + witness.

Even if you had multiple VMKs configured for vSAN traffic only one should be used for unicastagent communication so please identify which one you should be using (10.10.10.x or 185.176.120.x)and remove the other one.

Another point would be to ensure you have traffic types configured correctly - data-nodes should have 'vsan' for the inter-node vSAN traffic and 'witness' to the witness (if using Witness Traffic Seperation), witness should have only type 'vsan', this should show these:

#esxcli vsan network list

Bob

View solution in original post

4 Replies
TheBobkin
VMware Employee
VMware Employee

Hello Yokodzun

Can you check this from the health UI and retest? Cluster > Monitor > Health > Network test

If it cannot communicate with the Witness then it will only show 2 nodes in membership of #esxcli vsan cluster get.

If it is truly partitioned then I would advise checking the unicast traffic on port 12321 to and from the Witness vSAN address:

# nc -uz <DestIP> 12321

Check the unicastagent lists on the data-nodes contains the correct address,membership type (witness=1) and UUID (may be all 0's in some builds is normal) for the witness:

#esxcli vsan cluster unicastagent list

Bob

0 Kudos
Yokodzun
Enthusiast
Enthusiast

Can you check this from the health UI and retest? Cluster > Monitor > Health > Network test

An error is reproduced after reset:

Dropbox - Screenshot 2019-03-05 13.50.09.png

If it is truly partitioned then I would advise checking the unicast traffic on port 12321 to and from the Witness vSAN address:

From ssh shell, all looks good. From witness to data nodes:

[root@w:~] nc -uz esx1.energytel.net.ua 12321

Connection to esx1.energytel.net.ua 12321 port [udp/*] succeeded!

[root@w:~] nc -uz esx2.energytel.net.ua 12321

Connection to esx2.energytel.net.ua 12321 port [udp/*] succeeded!

[root@w:~]

From the data node to witness:

[root@esx1:~] nc -uz w.energytel.net.ua 12321

Connection to w.energytel.net.ua 12321 port [udp/*] succeeded!

[root@esx2:~] nc -uz w.energytel.net.ua 12321

Connection to w.energytel.net.ua 12321 port [udp/*] succeeded!

All data node contains info about other nodes and witness:

[root@esx1:~] esxcli vsan cluster unicastagent list

NodeUuid                              IsWitness  Supports Unicast  IP Address        Port  Iface Name

------------------------------------  ---------  ----------------  ---------------  -----  ----------

5a8e8bc2-dc85-7220-6989-248a076c7550          0              true  10.10.10.20      12321           

5a8e8bc2-dc85-7220-6989-248a076c7550          0              true  185.176.120.5    12321           

00000000-0000-0000-0000-000000000000          1              true  213.133.160.226  12321           

[root@esx2:~] esxcli vsan cluster unicastagent list

NodeUuid                              IsWitness  Supports Unicast  IP Address        Port  Iface Name

------------------------------------  ---------  ----------------  ---------------  -----  ----------

5a8e825d-973e-6374-3f40-248a076c7580          0              true  10.10.10.10      12321           

5a8e825d-973e-6374-3f40-248a076c7580          0              true  185.176.120.4    12321           

00000000-0000-0000-0000-000000000000          1              true  213.133.160.226  12321           

Both nodes show that's they work on unicast mode:

[root@esx2:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2019-03-05T12:27:48Z

   Local Node UUID: 5a8e8bc2-dc85-7220-6989-248a076c7550

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 5a8e8bc2-dc85-7220-6989-248a076c7550

   Sub-Cluster Backup UUID: 5a8e825d-973e-6374-3f40-248a076c7580

   Sub-Cluster UUID: 52707a2b-b6cd-1e15-7fff-6ab6d0dd3466

   Sub-Cluster Membership Entry Revision: 7

   Sub-Cluster Member Count: 2

   Sub-Cluster Member UUIDs: 5a8e8bc2-dc85-7220-6989-248a076c7550, 5a8e825d-973e-6374-3f40-248a076c7580

   Sub-Cluster Membership UUID: 0bc4765c-a062-7e00-d056-248a076c7550

   Unicast Mode Enabled: true

   Maintenance Mode State: OFF

   Config Generation: f7abdef2-bf70-45d9-8498-3b24337d0f69 2 2019-03-04T17:01:40.991

But VCSA show "multicast". And fail network checks.

0 Kudos
TheBobkin
VMware Employee
VMware Employee

Hello Yokodzun​,

[root@esx1:~] esxcli vsan cluster unicastagent list 

NodeUuid                              IsWitness  Supports Unicast  IP Address        Port  Iface Name 

------------------------------------  ---------  ----------------  ---------------  -----  ---------- 

5a8e8bc2-dc85-7220-6989-248a076c7550          0              true  10.10.10.20      12321             

5a8e8bc2-dc85-7220-6989-248a076c7550          0              true  185.176.120.5    12321             

00000000-0000-0000-0000-000000000000          1              true  213.133.160.226  12321             

 

[root@esx2:~] esxcli vsan cluster unicastagent list 

NodeUuid                              IsWitness  Supports Unicast  IP Address        Port  Iface Name 

------------------------------------  ---------  ----------------  ---------------  -----  ---------- 

5a8e825d-973e-6374-3f40-248a076c7580          0              true  10.10.10.10      12321             

5a8e825d-973e-6374-3f40-248a076c7580          0              true  185.176.120.4    12321             

00000000-0000-0000-0000-000000000000          1              true  213.133.160.226  12321             

You appear to have extraneous unicastagent entries there - in a 1+1+1 cluster you should have exactly 2 unicast entries on each host: namely the other data-node + witness.

Even if you had multiple VMKs configured for vSAN traffic only one should be used for unicastagent communication so please identify which one you should be using (10.10.10.x or 185.176.120.x)and remove the other one.

Another point would be to ensure you have traffic types configured correctly - data-nodes should have 'vsan' for the inter-node vSAN traffic and 'witness' to the witness (if using Witness Traffic Seperation), witness should have only type 'vsan', this should show these:

#esxcli vsan network list

Bob

View solution in original post

Yokodzun
Enthusiast
Enthusiast

Another point would be to ensure you have traffic types configured correctly - data-nodes should have 'vsan' for the inter-node vSAN traffic and 'witness' to the witness (if using Witness Traffic Seperation), witness should have only type 'vsan', this should show these:

Thank you, that was the cause of my problem.  I removed the unnecessary types of traffic from the ports, and the problem went away.

0 Kudos