VMware Cloud Community
him_ng
Contributor
Contributor

VSAN "misconfiguration detected"

Hi,

We are trying to setup VSAN but failed, it keep showing "misconfiguration detected"

"One or more hosts cannot communicate with the VSAN datastore.

Each host requires at least one vmkernel adapter with VSAN service enabled. All those adapters need to be connected to the same physical network to ensure correct communication with the VSAN datastore.

To view the network partition groups, check the respective column in the Disk Management grid."

We confirmed we have enabled IGMP snooping in our Cisco switch.

Global IGMP Snooping configuration:

-------------------------------------------

IGMP snooping                : Enabled

IGMPv3 snooping (minimal)    : Enabled

Report suppression           : Enabled

TCN solicit query            : Disabled

TCN flood query count        : 2

Robustness variable          : 2

Last member query count      : 2

Last member query interval   : 1000

Any idea to troubleshoot ?

Thanks.

Tags (1)
19 Replies
arvinnd
VMware Employee
VMware Employee

Try following for each host in vsan cluster

------------------------------------------------------------


Select Host

Host->Manage->Networking -> VMkernel adapters

Select a Vmkernel device

Drag the horizontal scroll bar to right Most & Look for column "Virtual SAN Traffic" Check if service is "Disabled" If So

Edit this and "Edit Settings" popu will showup

Select "Virtual SAN Traffic"

Once repeated for all hosts Repeat for all host try to refresh & verify the configuration now

Reply
0 Kudos
depping
Leadership
Leadership

If IGMP Snooping is enabled you will also need to have a querier configured. Is that the case?

Reply
0 Kudos
him_ng
Contributor
Contributor

Virtual SAN Traffic already enabled.

Hi depping,

Could is advise more about the querier ?

Thanks.

Reply
0 Kudos
depping
Leadership
Leadership

Yeah no problem. When you enable IGMP Snooping you also need to create a querier. I am not the networking expert, so don't shoot me if I get this wrong, but the querier is basically taking care of routing the multicast traffic from VSAN. It basically keeps tables which says what goes to where. Without the querier those table do not exist and traffic will not flow. So you have two options:

1) enable a querier

2) disable snooping

preferred option is most definitely 1! Kamau has a great explanation of it here: http://www.borgcube.com/blogs/2012/02/vshpere-multicast-support/

More details from Cisco: Cisco Nexus 5000 Series NX-OS Software Configuration Guide - Configuring IGMP Snooping [Cisco Nexus ...

Reply
0 Kudos
depping
Leadership
Leadership

Can you give the output of:

show ip igmp snooping


and


show ip igmp snooping vlan <vsan vlan id>

Reply
0 Kudos
him_ng
Contributor
Contributor

Here you are.

Global IGMP Snooping configuration:

-------------------------------------------

IGMP snooping                : Enabled

IGMPv3 snooping (minimal)    : Enabled

Report suppression           : Enabled

TCN solicit query            : Disabled

TCN flood query count        : 2

Robustness variable          : 2

Last member query count      : 2

Last member query interval   : 1000

Vlan 101:

--------

IGMP snooping                       : Enabled

IGMPv2 immediate leave              : Disabled

Multicast router learning mode      : pim-dvmrp

CGMP interoperability mode          : IGMP_ONLY

Robustness variable                 : 2

Last member query count             : 2

Last member query interval          : 1000

Reply
0 Kudos
depping
Leadership
Leadership

That looks good to me, maybe I should have started with this:

  • how many vmkernels do you have with VSAN enabled?
  • How many physical nics are they attached to?
  • how many physical switches are used for VSAN traffic?

i am assuming by the way that there is a router configured for multicast as well then?

Reply
0 Kudos
depping
Leadership
Leadership

Did this get resolved him_ng?

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot

Him_ng - if you have no luck please try to disable IGMP snooping at all - by that you will see if it is a multicast issue. If it works after this setting, AND you have other multicast traffic and thus maybe be in need for igmp snooping, you can enable igmp snooping again you you have to make 100% sure you enable an igmp querier, too. Every good switch can do that for you 😉 but the querier needs to be in place - otherwise the switch don´t know where to shoot the multicast to and who to ask where it needs to go. Like a router 😉 In fact, networking people do call the querier a "router" very often and the querier does in fact "route" multicast traffic. Not the classic way but something like it.

By disabling igmp snooping completely any multicast traffic will always work out of the box BUT it will be populated to all devices.

Best regards,
Joerg

Reply
0 Kudos
afarrior
Contributor
Contributor

Would someone share a working configuration on a Cisco Nexus 5000 that's used with vSAN?

does the VLAN used for vSAN traffic need to have a layer 3 interface defined?

thx,

andy

Reply
0 Kudos
JumpyJW
Enthusiast
Enthusiast

I'm not Cisco SME but here's a sanitized snippet I got from my network admin for my VSAN 5.5 running config on a Nexus 5K

== Nexus 5K Switch 1/2 ==

vlan configuration 302

ip igmp snooping querier 192.168.6.10

 

5KSW03# show ip igmp snooping vlan 302

IGMP Snooping information for vlan 302

  IGMP snooping enabled

  Optimised Multicast Flood (OMF) disabled

IGMP querier present, address: 192.168.6.10, version: 2, i/f Po73

Switch-querier disabled

  IGMPv3 Explicit tracking enabled

  IGMPv2 Fast leave disabled

  IGMPv1/v2 Report suppression enabled

  IGMPv3 Report suppression disabled

  Link Local Groups suppression enabled

  Router port detection using PIM Hellos, IGMP Queries

  Number of router-ports: 2

  Number of groups: 2

  VLAN vPC function enabled

5KSW03# show ip igmp snooping groups vlan 302

Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port

Vlan  Group Address      Ver  Type  Port list

302 */* -    R     Po73 Po76

302 224.1.2.3          v2 D     Po242 Po241

302 224.2.3.4          v2 D     Po241 Po242 Po240

Po243 Po244 Po245 Po246

Po247

Reply
0 Kudos
JumpyJW
Enthusiast
Enthusiast

I have found mixed results between a Nexus 5K and Catalyst 6500 series but came to a conclusion that the best config is to have IGMP snooping ON with querier.

See my blog on this at JumpyJ.com &amp;#8211; Adventures of Most Things IT

Happy to be corrected.

Reply
0 Kudos
jackchentoronto
Enthusiast
Enthusiast

I am also having problem to setup vSAN 6.0 now.

I have three ESXi server, seems the network is "not stable". by "not stable", I mean if I run tcpdump-uw, I can see those multicast packets, but sometime I see packets from all three hosts, sometime only two, and it keep changing after a while. The result is some time all three server are in same network partition group1; some time server1 in group1, server 2 and server 3 in group2; some time server1 and server 2 in group1, server3 in group2; some time they are showing in group1,group2,group3, really drive me crazy.

We have a dedicated Dell N4032 as vSAN switch and no other traffic on it, so I disabled igmp snooping on the vlan, but it didn't help, still same result.

I would like to try igmp querier, but one thing I am not sure is :

IGMP querier present, address: 192.168.6.10



What 's the address I should use for the querier? couldn't find any document what it should be.




I tried to put the N4032's interface IP there but it didn't work, it always has :


Operational State.............................. Disabled



Reply
0 Kudos
jkoebrunner
Enthusiast
Enthusiast

Maybe this this does not help but have you checked if the ESXi ARP resolution can resolve all VSAN IPs of the other ESXi hosts?

esxcli network ip neighbor list

I had the same "misconfiguration" issue and noticed that ARP resolution and PING did not work for one host, even the network was configured correctly (ARP entry was "incomplete").

Had to re-setup the ESXi host which fixed the problem...

Johannes Köbrunner IT Solutions Architect Virtualization, Network and Storage Systems Frequentis AG VTSP, VCP, VCAP-DCD
Reply
0 Kudos
jackchentoronto
Enthusiast
Enthusiast

Finally got my vSAN working.

In my case, it has nothing to do with IGMP snooping, seems the default configuration is ok for it, it's caused by MTU mismatch.

The error I made was I only changed vkernel MTU to 9000, and N4032's physical MTU to 9216, but I forgot to change the vSAN-vswitch to MTU 9000.

Once I change vswitch MTU to 9000 and confirmed all ESXi host can ping each other's vmkernel interface with "ping -d -s 8972", vSAN is happy and working ok now.

This article helped me to nail down the problem:

VSAN and Jumbo Frames | Welcome to Florida Cloud Labs

Reply
0 Kudos
Bleeder
Hot Shot
Hot Shot

The MTU difference isn't something that the VSAN health check plugin detects?

jackchentoronto
Enthusiast
Enthusiast

Hi Bleeder, I didn't realize there is such a plugin before, thanks for the information. I am installing it now, seems it will take a while, cause it need to reboot each ESXi hosts.

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot

MTU, MTU and again MTU 😉

I´d like to share a few thoughts about that.

I understand that my opinion regarding the use of jumbo frames may be not everyones and maybe many of you might not agree. But anyhow - as this is for discussion i like to share a few thoughts i recently discussed in a forum (pure storage):

Jumbo Frames is a mystery often not understood and often interpreted the wrong way. Some years ago it was somehow a hype to have jumbo frames enabled and some vendors where even insisting on it. The point is (and very few people know that): Jumbo Frames is no standard at all. Search any IEEE Ethernet comes to your eyes – you will not find it. Resulting are in today’s market very different implementations of jumbo frames. Vendor A might implement JF with an MTU of exactly 9000, Vendor B with 9216 and vendor C with 9105. No big deal? Wrong. If any device in the chain (let´s say a switch) implements a lower value than another device (let´s say the initiator or the target), jabber can occur in some scenarios. Please read http://en.wikipedia.org/wiki/Maximum_transmission_unit . Now – if any of you had to troubleshoot a jabber problem in the past you know how nasty that can be – especially to locate and isolate the causer.

Now for a word regarding the performance. It is true that let´s take for example iSCSI or some IP based transport - it has per nature of TCP/IP a more unfavorable ratio between overhead and payload than for example fibre channel. In FC the payload of a frame can hold up to 2112 bytes with a combined overhead of 36 bytes while in iSCSI the payload comes with 1460 bytes and the overhead with 76 bytes. So – an Ethernet packet without modifications has a maximum size of 1518 bytes. Everything more is known colloquially as „Jumbo Frames“, which can transport about 9000 bytes per frame.

So – huh – what about it – is there really a performance gain? Yes – in very special scenarios there is indeed. For example, when it comes to large 1:1 file transfers with high or variable send and receive windows you will definitely see a boost of about 10%.

But in real life scenarios – in most database scenarios I know – you see nothing. Sometimes you even see a performance decrease. And regarding VSAN - me personally (in my testlab - real life simulations) - well, i see no difference to be honest.

Of course that was not always so. Previously played about the CPU utilization an important role, making jumbo frames were almost always faster. Today however, this is already obsolete thanks to modern multi-core technology.

I know - probably are some jumbo frame lovers which totally disagree. But I'm interested in the opinion of the community very much. Also I'm interested in the experience of community.

Joerg

Reply
0 Kudos
zdickinson
Expert
Expert

If I can summarize:  Don't use Jumbo Frames just because.  Do testing and if there is a performance use them, if not then don't.  No need to introduce complexity if there is not reason.

With regards to technology I have found it a good idea to never say "always do X in every situation".  Replace X with Jumbo Frames in this case.

In our case we didn't even bother testing because performance was better than what we expected/needed.

Thank you, Zach.

Reply
0 Kudos