VMware Cloud Community
insearchof
Expert
Expert
Jump to solution

Host cannot communicate with one or more other nodes in the vSAN enabled cluster

VMware ESXi 6.7 U2

I just upgraded my ESXI hosts 4   to 6.7 U2

All hosts have and SSD Flash drive and  3 HDD drives of the same size

I turned on VSAN which then showed all disks and I claimed them all  it then went on to process the VSAM configuration.

One Host was not communicating and the process timed out.  The Host was non responsive from the console  the VMS were still accessible.

I had to power off the host and powered back on all came back.  the VMS are now all up and running.

On all four Hosts I have this message

Host cannot communicate with one or more other nodes in the vSAN enabled cluster

On the Cluster I have these messages.

    vSAN health alarms are suppressed

   

    vSAN datastore vsanDatastore in cluster TGCSNET-Vcenter2-Cluster in datacenter Datacenter-TGCSNET-2 does not have capacity

When I look at the datastores the VSAN-Datastore has no disks

Any ideas or thoughts on how to get this running?

Thank you

Tom

Reply
0 Kudos
1 Solution

Accepted Solutions
Nawals
Expert
Expert
Jump to solution

host 16 is part of same vsan cluster.? If yes, please remove from it and re-add host back to the cluster. Previous my answer was based on your first question.

please mark helpful or correct if issue resolve.

NKS Please Mark Helpful/correct if my answer resolve your query.

View solution in original post

Reply
0 Kudos
9 Replies
daphnissov
Immortal
Immortal
Jump to solution

Tom, you really need to spend some time in reading the documentation before starting with this product. This isn't something you just "wing." That said, you have a networking issue. Have you created vmkernel ports explicitly for vSAN and tagged them for this service? This is most likely a basic issue that could be understood when studying this product before attempting to implement it.

Reply
0 Kudos
insearchof
Expert
Expert
Jump to solution

daphnissov

sorry I did not get back so soon the site keeps telling me it is in maintenance mode.

I have a network setup just for VSAN

I ran this on host 1 

[root@TGCSESXI-15:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2020-03-18T22:34:20Z

   Local Node UUID: 5d38d9dd-590c-374e-20f8-001018f421ec

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 5d38d9dd-590c-374e-20f8-001018f421ec

   Sub-Cluster Backup UUID: 5d3bf7da-6354-159a-dbb8-000af7a054c8

   Sub-Cluster UUID: 52be451c-96dc-7a62-3e6f-9adf883c5415

   Sub-Cluster Membership Entry Revision: 4

   Sub-Cluster Member Count: 3

   Sub-Cluster Member UUIDs: 5d38d9dd-590c-374e-20f8-001018f421ec, 5d3bf7da-6354-159a-dbb8-000af7a054c8, 5d3e54c3-dad7-0f94-1d59-001018f511a0

  Sub-Cluster Member HostNames: TGCSESXI-15.our.network.tgcsnet.com, TGCSESXI-17.our.network.tgcsnet.com, TGCSESXI-18.our.network.tgcsnet.com

   Sub-Cluster Membership UUID: 4b74725e-7982-fae5-9e54-001018f421ec

   Unicast Mode Enabled: true

   Maintenance Mode State: OFF

   Config Generation: 4d3e48f2-e56c-4f5f-bb29-c3b16c0be4dd 3 2020-03-18T20:39:00.353

Then on Host 2   which is the host that froze

[root@TGCSESXI-16:~] esxcli vsan cluster get

vSAN Clustering is not enabled on this host

How can I enable vsan on one of my hosts?   The other three look ok 

Once the Node 2 gets configured I believe the vsan will config everything

Or will I need to run a command to start the process again?

Thank you

Tom

Reply
0 Kudos
Nawals
Expert
Expert
Jump to solution

Workaround:

Restart the VPXA management agent by running this command

    /etc/init.d/vpxa restart

Restart hostd and vsanmgmtd service in ESXi.

      /etc/init.d/hostd restart        

      /etc/init.d/vsanmgmtd restart

If service restart does not resolve the issue, place the host in maintenance mode and reboot.

VMware Knowledge Base

Please mark correct or helpful if resolve the issue.

NKS Please Mark Helpful/correct if my answer resolve your query.
Nawals
Expert
Expert
Jump to solution

host 16 is part of same vsan cluster.? If yes, please remove from it and re-add host back to the cluster. Previous my answer was based on your first question.

please mark helpful or correct if issue resolve.

NKS Please Mark Helpful/correct if my answer resolve your query.
Reply
0 Kudos
insearchof
Expert
Expert
Jump to solution

Nawals,

Moving it out of the cluster and then back to the cluster worked it added the vsan configuration.

[root@TGCSESXI-16:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2020-03-19T12:17:01Z

   Local Node UUID: 5d3b6dc2-19e8-f2da-5ba4-000af77acb40

   Local Node Type: NORMAL

   Local Node State: AGENT

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 5d38d9dd-590c-374e-20f8-001018f421ec

   Sub-Cluster Backup UUID: 5d3bf7da-6354-159a-dbb8-000af7a054c8

   Sub-Cluster UUID: 52be451c-96dc-7a62-3e6f-9adf883c5415

   Sub-Cluster Membership Entry Revision: 5

   Sub-Cluster Member Count: 4

   Sub-Cluster Member UUIDs: 5d38d9dd-590c-374e-20f8-001018f421ec, 5d3bf7da-6354-159a-dbb8-000af7a054c8, 5d3e54c3-dad7-0f94-1d59-001018f511a0, 5d3b6dc2-19e8-f2da-5ba4-000af77acb40

   Sub-Cluster Member HostNames: TGCSESXI-15.our.network.tgcsnet.com, TGCSESXI-17.our.network.tgcsnet.com, TGCSESXI-18.our.network.tgcsnet.com, TGCSESXI-16.our.network.tgcsnet.com

   Sub-Cluster Membership UUID: 4b74725e-7982-fae5-9e54-001018f421ec

   Unicast Mode Enabled: true

   Maintenance Mode State: OFF

   Config Generation: 4d3e48f2-e56c-4f5f-bb29-c3b16c0be4dd 5 2020-03-19T12:15:45.884

Now I just need to get the VSANDatastore  with capacity.

Any way to trigger that to process again?

Thanks Tom

Reply
0 Kudos
insearchof
Expert
Expert
Jump to solution

I figured it out.

Was able to claim all the SSD as cache and all HDD as capacity

The VSAN Datastore now has capacity great.

When I went to create a folder I got this message

This operation requires 2 or more usable fault domains.

I have one fault domain with all my hosts assigned.

Any ideas.

Thanks Tom

Reply
0 Kudos
insearchof
Expert
Expert
Jump to solution

sorry I figured this out also.

I do not require a Fault domain since they only work with hosts in the same rack

I deleted the Fault domain and now able to write to the VSANDatastore.

Thanks Tom

Reply
0 Kudos
Nawals
Expert
Expert
Jump to solution

Follow below to fix to get vSandatastore with capacity.

Workaround:  
    Restart hostd and vsanmgmtd service in ESXi.  

      /etc/init.d/hostd restart          
            /etc/init.d/vsanmgmtd restart
 

 

    If service restart does not resolve the issue, place the host in maintenance mode and reboot.  

Related VMware KB:  
    VSAN cluster would display incorrect capacity information with 0 Bytes used and 0 Bytes free (58928)  
    https://kb.vmware.com/s/article/58928

NKS Please Mark Helpful/correct if my answer resolve your query.
Reply
0 Kudos
insearchof
Expert
Expert
Jump to solution

Nawals

I posted another question.

After several attempts to get VSAN to work I gave up on this My ESXI host keep freezing up after access the vsan datastore from my veeam replication jobs.

I decided to remove vsan all together but now my local drives are read only I can not create a new datastore from them

I posted another question in the VSAN group if you have any ideas on how a can get my drives formatted so I can make them local datastores.

Thank you

Tom

Reply
0 Kudos