VMware Cloud Community
vmsysadmin20111
Enthusiast
Enthusiast

cannot add a node back into VSAN cluster

Hi all,

I have a 4 node VSAN cluster. One of the nodes (apparently it was a master node) was improperly removed from the cluster (disconnected then removed). The three remaining nodes are fine, but I'm now not able to return the missing node into the cluster. When I add it back, it appears to be creating another cluster with it being a single member, and as a result, I'm not able to browse the VSAN datastore in the vCenter client (VMs are OK though).

Node 1 before joining the VSAN cluster:

[root@fx2-esxi-01:~]  esxcli vsan cluster get

Virtual SAN Clustering is not enabled on this host

Working VSAN cluster before node1 is joined:

[root@fx2-esxi-04:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2017-04-21T16:17:33Z

   Local Node UUID: 58d04aea-1952-3758-4c9d-107d1a8fb9a7

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 58d04aea-1952-3758-4c9d-107d1a8fb9a7

   Sub-Cluster Backup UUID: 58cc1241-1e61-a964-ed3f-107d1a8fb3ef

   Sub-Cluster UUID: 52311b70-024e-7173-ac6e-92638c796a1a

   Sub-Cluster Membership Entry Revision: 12

   Sub-Cluster Member Count: 3

   Sub-Cluster Member UUIDs: 58d04aea-1952-3758-4c9d-107d1a8fb9a7, 58cc1241-1e61-a964-ed3f-107d1a8fb3ef, 58cc1fa4-bc1c-71ad-9f0d-107d1a8fb369

   Sub-Cluster Membership UUID: c193f958-30b3-2c5e-833c-107d1a8fb9a7

Node 1 after it is added back into the cluster:

[root@fx2-esxi-01:~]  esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2017-04-21T16:48:43Z

   Local Node UUID: 58cc0c20-eddb-7b02-7e25-107d1a8fb301

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 58cc0c20-eddb-7b02-7e25-107d1a8fb301

   Sub-Cluster Backup UUID:

   Sub-Cluster UUID: 52311b70-024e-7173-ac6e-92638c796a1a

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member Count: 1

   Sub-Cluster Member UUIDs: 58cc0c20-eddb-7b02-7e25-107d1a8fb301

   Sub-Cluster Membership UUID: e837fa58-35df-3663-af01-107d1a8fb301

Note that the Sub-Cluster UUID is the same, but the cluster member count is 1. VSAN health check does show cluster partitioning and multicast issues:

pastedImage_3.png

I don't think there are multicast issues since everything was working fine before the first node was removed. Looking at packets tab, it looks like heartbeats from the original master (.201) are received by all 4 nodes, but heartbeats from the new master (.204) are received only by 3 surviving nodes and not by .201 (same group though):

pastedImage_4.png

All nodes are connected using a single 10GB uplink (the second one is in standby) to an internal switch on Dell FX2 system (Dell PowerEdge FN 410S IOM in standalone mode)

Dell#sh ip igmp snooping groups detail

Interface             Vlan 227

Group                 224.1.2.3

Uptime                4w2d

Expires               00:02:05

Router mode           EXCLUDE

Last reporter         192.168.227.204

Last reporter mode    EXCLUDE

Last report received  IS_EXCL

Group source list

Source address                   Uptime      Expires

Interface             Vlan 227

Group                 224.2.3.4

Uptime                4w2d

Expires               00:02:05

Router mode           EXCLUDE

Last reporter         192.168.227.204

Last reporter mode    EXCLUDE

Last report received  IS_EXCL

Group source list

Source address                   Uptime      Expires

Dell#

Any thoughts on what else to check? Thanks in advance!

Reply
0 Kudos
2 Replies
admin
Immortal
Immortal

Hi

if your host have been moving out to cluster, I think you need to re-created DG..

put host into MM

delete DG

remove host from cluster.

exit host from MM

add vmkernel for vSAN

join cluster. ensure everything works fine.

re-create DG

ensure, your VM is OK and nothing resynced before put host into MM..

vmsysadmin20111
Enthusiast
Enthusiast

Thanks, ended up removing the DG on the failed host and then added it back, but still continued to have multicast issues. In the end, had to switch both FX2 FN-410S modules from standalone to PMUX mode, which disables igmp snooping on the IOMs, and configure snooping and querier on the TOR Brocade VDX switch. This fixed the multicast issues.

Reply
0 Kudos