Hi all,
I have a 4 node VSAN cluster. One of the nodes (apparently it was a master node) was improperly removed from the cluster (disconnected then removed). The three remaining nodes are fine, but I'm now not able to return the missing node into the cluster. When I add it back, it appears to be creating another cluster with it being a single member, and as a result, I'm not able to browse the VSAN datastore in the vCenter client (VMs are OK though).
Node 1 before joining the VSAN cluster:
[root@fx2-esxi-01:~] esxcli vsan cluster get
Virtual SAN Clustering is not enabled on this host
Working VSAN cluster before node1 is joined:
[root@fx2-esxi-04:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2017-04-21T16:17:33Z
Local Node UUID: 58d04aea-1952-3758-4c9d-107d1a8fb9a7
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 58d04aea-1952-3758-4c9d-107d1a8fb9a7
Sub-Cluster Backup UUID: 58cc1241-1e61-a964-ed3f-107d1a8fb3ef
Sub-Cluster UUID: 52311b70-024e-7173-ac6e-92638c796a1a
Sub-Cluster Membership Entry Revision: 12
Sub-Cluster Member Count: 3
Sub-Cluster Member UUIDs: 58d04aea-1952-3758-4c9d-107d1a8fb9a7, 58cc1241-1e61-a964-ed3f-107d1a8fb3ef, 58cc1fa4-bc1c-71ad-9f0d-107d1a8fb369
Sub-Cluster Membership UUID: c193f958-30b3-2c5e-833c-107d1a8fb9a7
Node 1 after it is added back into the cluster:
[root@fx2-esxi-01:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2017-04-21T16:48:43Z
Local Node UUID: 58cc0c20-eddb-7b02-7e25-107d1a8fb301
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 58cc0c20-eddb-7b02-7e25-107d1a8fb301
Sub-Cluster Backup UUID:
Sub-Cluster UUID: 52311b70-024e-7173-ac6e-92638c796a1a
Sub-Cluster Membership Entry Revision: 0
Sub-Cluster Member Count: 1
Sub-Cluster Member UUIDs: 58cc0c20-eddb-7b02-7e25-107d1a8fb301
Sub-Cluster Membership UUID: e837fa58-35df-3663-af01-107d1a8fb301
Note that the Sub-Cluster UUID is the same, but the cluster member count is 1. VSAN health check does show cluster partitioning and multicast issues:
I don't think there are multicast issues since everything was working fine before the first node was removed. Looking at packets tab, it looks like heartbeats from the original master (.201) are received by all 4 nodes, but heartbeats from the new master (.204) are received only by 3 surviving nodes and not by .201 (same group though):
All nodes are connected using a single 10GB uplink (the second one is in standby) to an internal switch on Dell FX2 system (Dell PowerEdge FN 410S IOM in standalone mode)
Dell#sh ip igmp snooping groups detail
Interface Vlan 227
Group 224.1.2.3
Uptime 4w2d
Expires 00:02:05
Router mode EXCLUDE
Last reporter 192.168.227.204
Last reporter mode EXCLUDE
Last report received IS_EXCL
Group source list
Source address Uptime Expires
Interface Vlan 227
Group 224.2.3.4
Uptime 4w2d
Expires 00:02:05
Router mode EXCLUDE
Last reporter 192.168.227.204
Last reporter mode EXCLUDE
Last report received IS_EXCL
Group source list
Source address Uptime Expires
Dell#
Any thoughts on what else to check? Thanks in advance!
Hi
if your host have been moving out to cluster, I think you need to re-created DG..
put host into MM
delete DG
remove host from cluster.
exit host from MM
add vmkernel for vSAN
join cluster. ensure everything works fine.
re-create DG
ensure, your VM is OK and nothing resynced before put host into MM..
Thanks, ended up removing the DG on the failed host and then added it back, but still continued to have multicast issues. In the end, had to switch both FX2 FN-410S modules from standalone to PMUX mode, which disables igmp snooping on the IOMs, and configure snooping and querier on the TOR Brocade VDX switch. This fixed the multicast issues.