VMware Cloud Community
BiswarajPattan
Contributor
Contributor
Jump to solution

vSAN datastore

Hi team,

I have configured three nodes vSAN cluster in Hybrid mode. vSAN datastore is not showing on host1 esxi box. But vSAN Datastore is showing in host 2 and host3 ESXi box. I am trying to figure it out but failed. Please find the attached screenshots and let me know how this three node vSAN cluster will function.

ESXI version 6.7u3

vCenter Appliance 6.7 Update 3d

Thanks

Reply
0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

@BiswarajPattan , From node2 perspective there is 3 cluster members but you can see the HostNames don't include node1 which is an indicator of an issue, from node1 perspective it is isolated from the cluster (member count: 1).

 

So, from this we can determine that there is either 1. inconsistent connectivity/membership between nodes (e.g. node2 can talk to node1 but node3 cannot talk to node1) OR 2. only 1-way communication between node2/node3 and node1.

 

The following should be done to test which of 1./2. is the case and why this is not working:
SSH to all 3 nodes and gather the following information from all nodes:

 

# esxcli vsan cluster unicastagent list
ALL nodes should have 2 unicastagent entries here which store the UUID and vsan-IP information they will use to communicate with the other nodes - if any nodes have only 1 entry or no entries then this is the problem and should be fixed (ideally via Skyline Health/vSphere client but can be done via CLI).

 

List what vmk(s) is tagged for vsan traffic:
# esxcli vsan network list
Confirm what MTU these vmknics are set to use and get the IP of the vsan-tagged vmk:
# esxcfg-vmknic -l
(If using 9000 MTU on the vsan-tagged vmk, you maybe should also validate the used vmnics are set to this (esxcfg-nics -l) and the switchport used for this is also set to 9000 (esxcfg-vswitch -l))

 

Now, use this information to test ping from/to each of the nodes with the correct max MTU e.g. from node1:
If using 1500 MTU on the vsan-tagged vmk then (where vmkX = the vsan-tagged vmk on the source node):
# vmkping -I vmkX -s 1472 -d vSAN-IP-of-Node2
# vmkping -I vmkX -s 1472 -d vSAN-IP-of-Node3
If using 9000 MTU on the vsan-tagged vmk then:
# vmkping -I vmkX -s 8972 -d vSAN-IP-of-Node2
# vmkping -I vmkX -s 8972 -d vSAN-IP-of-Node3
Test this in both directions from all nodes and using the MTU of the vsan-tagged vmk - if any are misconfigured and using different MTUs then this is your problem and should be corrected.
If (once any MTU issues have been resolved) any nodes cannot communicate with any other nodes then you likely have a virtual/physical configuration issue (e.g. only one vmnic attached to node1 where others have two going to different switches, or a VLAN mistagged or misconfigured or just no physical network route).

 

If all nodes have correct unicastagent entries AND have network communication over vmkping at the correct (and consistent) MTU settings and still have issues then that really only leaves required ports being blocked (UDP 12321 and TCP 2233 being most critical) or a logic reason e.g. they are on very different versions (e.g. 6.7 U3 and others on 7.0 U1) and have different incompatible versions of CMMDS in use.

View solution in original post

12 Replies
SureshKumarMuth
Commander
Commander
Jump to solution

Can you please provide the output of following command from host1 and host2 in the cluster

esxcli vsan cluster get

Also if you login to the host Client of host1 , are you able to see the datastore?

Regards,

Suresh

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

@BiswarajPattanThat node likely doesn't show vsanDatastore accessible to it due to being network isolated from the cluster.

All nodes need to have a usable vsan-tagged vmknic where there is unimpeded communication to all other nodes vsan-tagged vmknic e.g. they should all be in the same subnet, be using consistent MTU settings, be configured with vmnic(s) that are connected to shared switch(es), and not have any required ports blocked (e.g. UDP 12321 and TCP 2233 being the 2 main ones).

What the specific problem is should be relatively obvious from:

Cluster > Monitor > vSAN > Skyline Health

Reply
0 Kudos
lukaszzasko
Enthusiast
Enthusiast
Jump to solution

Hi,

Start ssh session on host1 and try to ping VSAN vmkernel adapter on host 2 or 3 using: vmkping -I vmk1 [vsan vmk1 ip on host2 or host3]

If you cannot ping hosts 2 and 3 you must verify network connection because host 1 can be in isolation state.

Next you can check if vmkernel adapters on all hosts have vsan service enabled on the same vmkernel adapter.

Reply
0 Kudos
bartdanby
Contributor
Contributor
Jump to solution

Thank you all for assisting with this issue.

All 3 host are able to ping each other on the vSan vmnic, host 1 can ping 2&3, host 2 can ping 1&3 and host 3 is enable ping 1&2.

vSan is enabled I ran: esxcli vsan network ip -i vmk#

Thanks

Reply
0 Kudos
bartdanby
Contributor
Contributor
Jump to solution

Yes, that is correct. All nodes have a vSAn tagged vmknic with the same subnet. I have check the firewall and show that the required ports are not blocked.

Without vCenter

Reply
0 Kudos
bartdanby
Contributor
Contributor
Jump to solution

I meant to add that I do not have vCenter available

Reply
0 Kudos
BiswarajPattan
Contributor
Contributor
Jump to solution

Hi All,

When i am trying to deploy a virtual machine in a vSAN cluster, the operation fails with an error that the virtual machine files cannot be created. Screenshots attached.

Reply
0 Kudos
BiswarajPattan
Contributor
Contributor
Jump to solution

@SureshKumarMuth    I am really sorry for the delay response ,Please find the out put of host 1 and host 2.Now vSAN is setup is done but now the  issue is I can not deploy a new VM under vSAN datastore. please suggest for the next step.

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

@BiswarajPattan , From node2 perspective there is 3 cluster members but you can see the HostNames don't include node1 which is an indicator of an issue, from node1 perspective it is isolated from the cluster (member count: 1).

 

So, from this we can determine that there is either 1. inconsistent connectivity/membership between nodes (e.g. node2 can talk to node1 but node3 cannot talk to node1) OR 2. only 1-way communication between node2/node3 and node1.

 

The following should be done to test which of 1./2. is the case and why this is not working:
SSH to all 3 nodes and gather the following information from all nodes:

 

# esxcli vsan cluster unicastagent list
ALL nodes should have 2 unicastagent entries here which store the UUID and vsan-IP information they will use to communicate with the other nodes - if any nodes have only 1 entry or no entries then this is the problem and should be fixed (ideally via Skyline Health/vSphere client but can be done via CLI).

 

List what vmk(s) is tagged for vsan traffic:
# esxcli vsan network list
Confirm what MTU these vmknics are set to use and get the IP of the vsan-tagged vmk:
# esxcfg-vmknic -l
(If using 9000 MTU on the vsan-tagged vmk, you maybe should also validate the used vmnics are set to this (esxcfg-nics -l) and the switchport used for this is also set to 9000 (esxcfg-vswitch -l))

 

Now, use this information to test ping from/to each of the nodes with the correct max MTU e.g. from node1:
If using 1500 MTU on the vsan-tagged vmk then (where vmkX = the vsan-tagged vmk on the source node):
# vmkping -I vmkX -s 1472 -d vSAN-IP-of-Node2
# vmkping -I vmkX -s 1472 -d vSAN-IP-of-Node3
If using 9000 MTU on the vsan-tagged vmk then:
# vmkping -I vmkX -s 8972 -d vSAN-IP-of-Node2
# vmkping -I vmkX -s 8972 -d vSAN-IP-of-Node3
Test this in both directions from all nodes and using the MTU of the vsan-tagged vmk - if any are misconfigured and using different MTUs then this is your problem and should be corrected.
If (once any MTU issues have been resolved) any nodes cannot communicate with any other nodes then you likely have a virtual/physical configuration issue (e.g. only one vmnic attached to node1 where others have two going to different switches, or a VLAN mistagged or misconfigured or just no physical network route).

 

If all nodes have correct unicastagent entries AND have network communication over vmkping at the correct (and consistent) MTU settings and still have issues then that really only leaves required ports being blocked (UDP 12321 and TCP 2233 being most critical) or a logic reason e.g. they are on very different versions (e.g. 6.7 U3 and others on 7.0 U1) and have different incompatible versions of CMMDS in use.

BiswarajPattan
Contributor
Contributor
Jump to solution

@TheBobkin  Thanks a lot for your valuable support. As per your instructions, I followed all the steps and found there is an issue with MTU. I have changed the MTU of each ESXi host to 1500 and the magic happened. By executing the command esxcli vsan cluster get, I got the exact member count. Now I am able to deploy new VM under vSAN cluster. Please find the attached screenshots. Thanks again TheBobkin you resolved the issue.

Reply
0 Kudos
bartdanby
Contributor
Contributor
Jump to solution

We have tested successfully and recreated the unicast list. This put the hosts back in the cluster.

My issue is that 2 of the 3 host have maintenance mode state ON when running the command esxcli vsan cluster get. I cannot get it to cancel.

Which may be why the host only see there local storage and the how vSan datastore.

 

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

@bartdanby  are you a colleague of @BiswarajPattan  working on the same cluster or asking questions about your own cluster with similar (or maybe not?) issues here instead of in it's own separate thread? (and if so, why?)

 

If nodes are in a vSAN Decom state (e.g. shows 'in Maintenance Mode' in 'esxcli vsan cluster get') but NOT in ESXi Maintenance Mode (confirmable either vSphere/host client or via 'esxcli system maintenanceMode get') then these two MM are out of sync - this can easily be resolved by 1. confirming there are no running VMs on the host (and vMotioning them off it or powering them off if there are) then 2. placing it in MM via the vSphere or host client or CLI with 'No Action' option (precheck will fail, ignore this, proceed) then immediately taking it out of MM.

 

Reply
0 Kudos