VMware Cloud Community
rom3010
Enthusiast
Enthusiast

Another "Host cannot communicate with all other nodes in the vSAN enabled cluster" error

The error "Host cannot communicate with all other nodes in the vSAN enabeld cluster" is displayed in all the hosts that are part of the vSAN cluster.

When I execute tcpdump-uw -i vmk_name udp port 23451 I see lot of lines displaying something like:

15:21:34.761788 IP truncated-ip - 218 bytes missing! xxx.xxx.xxx.x24.50012 > 224.2.3.4.23451: UDP, length 272

15:21:35.392764 IP truncated-ip - 130 bytes missing! xxx.xxx.xxx.x22.32126 > 224.2.3.4.23451: UDP, length 184

15:21:35.761823 IP truncated-ip - 218 bytes missing! xxx.xxx.xxx.x24.50012 > 224.2.3.4.23451: UDP, length 272

15:21:36.392887 IP truncated-ip - 130 bytes missing! xxx.xxx.xxx.x22.32126 > 224.2.3.4.23451: UDP, length 184

15:21:36.761820 IP truncated-ip - 218 bytes missing! xxx.xxx.xxx.x24.50012 > 224.2.3.4.23451: UDP, length 272

....

When I run "esxcli network ip connection list | egrep 224" it displays the following depending on the host:

Host A)

udp    0  224.1.2.3:12345             0.0.0.0:0                       32792      netCoalesce2World         
udp    0  224.2.3.4:23451             0.0.0.0:0                       32792      netCoalesce2World   

Host B)

udp    0  224.1.2.3:12345             0.0.0.0:0                       33266      vmsyslogd             
udp    0  224.2.3.4:23451             0.0.0.0:0                       33266      vmsyslogd              

Host C)

udp    0  224.1.2.3:12345             0.0.0.0:0                       32849      RCUDeferredCallQueueWorld  
udp    0  224.2.3.4:23451             0.0.0.0:0                       32849      RCUDeferredCallQueueWorld

Host D)

udp    0  224.1.2.3:12345             0.0.0.0:0                       33444      Tcpip4 wtask
udp    0  224.2.3.4:23451             0.0.0.0:0                       33444      Tcpip4 wtask

But beside that it seems that vSAN seems to work correctly.

The details of my setup are the following:

The aren't any others vSAN clusters on the network.

The VLAN where the vSAN traffic resides is used exclusively for it.

The vCenter installation was updated the last time with "VMware-VIMSetup-all-5.5.0-2105955-20140901-update02.iso".

"C:\Windows\system32>"C:\Program Files\VMware\Infrastructure\VirtualCenter Server\vpxd.exe" -v" displays "VMware VirtualCenter 5.5.0 build-2001466"

Control Panel > Program and Features displays "VMware vCenter Server 5.5.0.42389"

- On vSphere Web Client

    - vSAN enabled Cluster > "Manage" > "Settings" > "Virtual SAN" > "General" displays "Normal" under "Network Status".

    - vSAN enabled Cluster > "Manage" > "Settings" > "Virtual SAN" > "Disk Management" displays "Healthy" on all Hosts and it's Disk Groups.

- On RVC command:

    - "vsan.check_state" reports nothing is out of sync.

    - "vsan.reapply_vsan_vmknic_config" reapplyies config of vSAN vmknic without problem but nothing changes.

- On each ESXi Host:

    - Ping from vSAN traffic vmkernel port (ping -I vmk_name host_IP) to the same type of interface on the rest of hosts works.

    - "esxcli vsan cluster get" returns "Local Node Health State: HEALTHY" for all members of the vSAN enabled cluster.

    - "esxcli vsan network list" returns the same Agent Group Multicast Address, Port, Master Group Multicast, Port for all the members.

Anyone has any idea?

Thanks in advance,

Ro

Reply
0 Kudos
3 Replies
CHogan
VMware Employee
VMware Employee

I wonder if this is vSphere HA rather than VSAN.

Can you check the state of your HA cluster - are all hosts still in the cluster?

If that doesn't look right - can you try disabling and reenabling HA on the cluster?

http://cormachogan.com
Reply
0 Kudos
rom3010
Enthusiast
Enthusiast

Hi CHogan,

Thanks for your reply.

Yes, all hosts all still part of the cluster.

Re-enabling vSphere HA didn't help.

Reply
0 Kudos
rom3010
Enthusiast
Enthusiast

Beside not seeing on the release notes of 2183112-20140901 anything related to the problem I was experiencing once updated from 2105955-20140901 the message changed to something like:

"Found another host" FQDM_of_the_host-1, FQDM_of_the_host-2, FQDM_of_the_host-3 "participating in the VSAN service which is not a member of this host's vCenter Cluster".

So I tried to remove the VSAN service from the vmkernel that was providing VSAN services on one of the hosts and re enable it later. After that, the host wasn't show anymore in the above list, so I proceeded to do the same to all of the hosts that where mentioned on the warning message. After doing this the message disappeared!

In the previous version ( 2105955-20140901 ) I was re enabling the VSAN service as I did in this version ( uncheck VSAN from the vmkernel nic and apply, wait a little and then enable and apply ), but the warning message was still there, so the warning suppress happened with the version update.

Reply
0 Kudos