VMware Cloud Community
WDGNet
Enthusiast
Enthusiast
Jump to solution

vMotion Ping Tests FAILED in vSAN Configuration Assistant Health Checks - new vSAN setup

Hi All-

So I have a new vSAN setup that I'm trying to configure.  Here's a basic rundown of design:

1.   4 Hosts

2.   1 dvSwitch

3.   dvPortGroups for vSAN, vMotion, Management and Production VMs

4.   vmKernels configured for vSAN, vMotion and Management

5.   8 uplinks, 8 NICs per host

In Configuration Assistant, there are 2 network related errors:

vMotion: MTU check (ping with large packet size) - FAILED

and

vMotion:: Basic (unicast) connectivity check - FAILED

I have SSH'd into each box to test connectivity.  I am able to ping every vmk IP address from any host to any host vmk IP.  MTU on the port groups are set to 1500.  I did not configure the physical switches, but I am assuming they are also set to 1500 MTU.

Is this something that can be disregarded since I am able to ping as noted?  Or is this s more serious issue that needs digging into?  Not exactly sure where to go aside from verifying the physical switch port config.  But then, why wouldn't the unicast ping fail also?  That's a tiny ping, unlike the MTU check.

Any help is appreciated!!

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
WDGNet
Enthusiast
Enthusiast
Jump to solution

This issue was resolved after speaking with my network engineer.  The two physical switch ports were channeled together.  Once I changed load balancing to "Route based on IP hash", everything works.  Thank you for responding everybody.

View solution in original post

6 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello WDGNet,

Are you using non-default network stack on that vMotion vmk?

This can cause this test to fail:

virten.net/2017/04/vsan-6-6-vmotion-basic-unicast-connectivity-check-fails/

yellow-bricks.com/2017/04/21/vsan-health-check-fails-vmotion/

Try changing the stack type as per the virten article above or suppressing that alert if this is not an option.

If the above does not fit the situation, let us know.

Bob

0 Kudos
WDGNet
Enthusiast
Enthusiast
Jump to solution

Hey Bob-

I saw that article, but all vmk's are using the "default network stack".  I know it's possible to disable these alerts, but I'd really like to know why this is happening.  This is a preproduction environment right now, so I'd like to get it 100% functional before I move forward with migrating off our legacy infrastructure. 

I appreciate the advice.

0 Kudos
WDGNet
Enthusiast
Enthusiast
Jump to solution

Hey All -

I have some more info that I hope can get this resolved from somebody on this forum.

So looking into the failures, I can see that 2 of my hosts can communicate to each other and have "passed" the ping tests.  Host 1 and Host 4.  I have SSH'd into Host 1 and verified that by using vmkping, I can ping the vMotion vmk IP address of Host 4 and vice versa (host 1 to 4).  However, when I try to vmkping the other hosts vMotion vmk IP addresses, they don't respond.  Here is an excerpt from the vsanmgmt.log:

2017-09-16T13:46:38Z VSANMGMTSVC: ERROR vsanperfsvc[745bd808-9ae5-11e7] [VsanHealthPing::PingTest] Pinger: Send ping error, target:x.x.x.x, size:9000, pingSeq:3 Traceback (most recent call last):   File "/build/mts/release/bora-5912974/bora/build/vsan/release/vsanhealth/usr/lib/vmware/vsan/perfsvc/VsanHealthPing.py", line 237, in PingTest OSError: [Errno 112] Host is down

For some reason, the other hosts are unable to ping on the vMotion network.  Only Hosts 1 and 4 can successfully ping between each others vmk's.  All hosts have a vmk setup identically, aside from their IP (same subnet of course) and are all using the same distributed port group on the same vDS.

Anybody out there with some tips?! 

0 Kudos
WDGNet
Enthusiast
Enthusiast
Jump to solution

More info----

I have been at this for a while now and ready to take a break.  So here's what I have done so far:

Verified on each host that vmnic0 is mapped to Uplink4, and vmnic4 is mapped to Uplink3.  There is a vMotion port group with these two uplinks teamed - they are both in the active uplinks list.  All 4 hosts have a vmkernel for vMotion mapped to the vMotion network. 

When both uplinks are in the Active Uplinks list, only two hosts can ping each others vmk (host 2 and 4).  When I move uplink3 into Standby leaving uplink4 in Active, hosts 4, 1 and 3 can't ping each others vmk IP address but the others can.  When I move uplink3 into Active and move uplink4 into Standby, the only hosts that can ping each others vmk ip address are hosts 2 and 4, the same hosts that can communicate when both uplinks are placed in Active Uplinks.  

The vMotion port group load balancing is currently set to "Route based on originating virtual port".  Also in the settings for the dvPort, under the advanced menu, everything is disabled except for block ports.  I haven't messed with these settings yet.

Anyway, could really use a fresh take on this.  Thanks.

0 Kudos
jameseydoyle
VMware Employee
VMware Employee
Jump to solution

HI WDGNet,

Could you send a sample configuration of one of the hosts, including details about ALL VMkernel ports configured, assuming that they are all configured with similar settings, with the following details:

VMkernel Port (vmk)     IP Address     Uplink used     Physical Switch port (CDP)

I can gather most of the information about this from the snippets you have provided, but I am missing the IP Addresses in particular.

0 Kudos
WDGNet
Enthusiast
Enthusiast
Jump to solution

This issue was resolved after speaking with my network engineer.  The two physical switch ports were channeled together.  Once I changed load balancing to "Route based on IP hash", everything works.  Thank you for responding everybody.