VMware Cloud Community
Chekan
Enthusiast
Enthusiast

vCenter 6.5u1 vMotion error - Failed waiting for data error

Hello! I have 3 esxi 6.5 hosts. vCenter version:  7515524     Hosts: 7388607

There is a dedicated 10Gb port on each for vMotion with 9000 MTU enabled. All three ports are connected into isolated vlan on D-Link DXS3600 switch (jumbo is enabled obviously).

But when i try to vmotion, i get following error (from any host to any host):

vmotion error.JPG

I totally dont get why.

Forgot to mention: vmkping -s 9000 works fine from any host to any host.

Сообщение отредактировано: Ivan Karmyshin

0 Kudos
20 Replies
Finikiez
Champion
Champion

Hi!

You need to include '-d' option into vmkping command which sets DF (Don't Fragment) bit on the IPv4 packet.

Definitely you need to double check MTU size on all hopes.

0 Kudos
Chekan
Enthusiast
Enthusiast

If i set MTU to 8000 on hosts vmotion works fine. The readon of this post is more educational. I wanna understand the reason why 9000 didnt work. Hosts are connected only to switch, all are 100% configured for 9000 MTU, i dont understand why. The maximum size which passes unfragmented is 8958.

0 Kudos
Finikiez
Champion
Champion

what is a MTU size configured on network switch?

0 Kudos
Chekan
Enthusiast
Enthusiast

9000 ( i tried with 10000 and 12000 even also)

0 Kudos
Finikiez
Champion
Champion

From my perspective it's an issue with jumbo frames on network switch.

To check what's going on you can capture network traffic on ESXi hosts during vmotion attempt with pktcap-uw VMware Knowledge Base and read dump with wireshark.

As well check that you have latest driver and firmware installed for your network card.

0 Kudos
rajen450m
Hot Shot
Hot Shot

Hi Chekan,

There was a similar issue and error within our environment. Issue was due to mismatch in Jumbo frame size in vmkernel and physical router (gateway).

Please check vMotion network and try to vmkping the source host and destination host vmotion vmknic ip address.

Verify the security settings/policies in vswitch/portgroups like promiscuous mode that are identical between two hosts or not?

Please check this knowledge base, related to similar errors:

VMware Knowledge Base

VMware Knowledge Base

Regards, Raj

Raj M Please mark helpful or correct if my answer resolved your issue. Visit www.hypervmwarecloud.com for my blog posts, step-by-step procedures etc.,
0 Kudos
Chekan
Enthusiast
Enthusiast

You know what is super strange. Yesterday, using vmkping -d o found out that maximum size of non-fragmented packet is 8958. So i did set MTU on ESXi to 8958 (on physical switch i left it on 9000) and vmotion started to work. Today i try to vmotion again and get error... I go adn do vmkping, now maximum size changed to 8930. How is this possible??? No changes were made since yesterday. Hosts are connected directly to switch - maximum simple structure.

0 Kudos
Chekan
Enthusiast
Enthusiast

I just realized and double checked. If i reduse MTU on hosts (both vswitch and vmkernel) - the actual maximum packet size reduses by 28 bits: for example if i set MTU to 8930 on Host - max vmkping -d will be 8902. So vMotion doesnt work at all... I am totally confused.

0 Kudos
Finikiez
Champion
Champion

I guess that we can help you only seeing your configs on DLink for Jumbo frames and vmkernel\vswitch.

Also different NICs have different behaviour when Jumbo frames don't work on physical switch. Some drivers can split big frame into small when they don't fit, but most drivers can't do this.

Again - collect dump when you get the error. This will help a lot.

0 Kudos
Chekan
Enthusiast
Enthusiast

Excuse me, what do you mean under "collect dump" ?

0 Kudos
MBreidenbach0
Hot Shot
Hot Shot

The packet size that you specify fro vmkping is the size of the data portion. The actial packet size is 28 bytes bigger so to test MTU 9000 you specify vmkping -d -s 8972

See also VMware Knowledge Base

0 Kudos
a_p_
Leadership
Leadership

the actual maximum packet size reduses by 28 bits

That's expected, the 28 bytes is vmkping header data (see e.g. Troubleshooting ESXi Jumbo Frames)

What should actually work is to set everything to MTU 9000. Physical switch/ports, vSwitch, and VMKernel port group.

I'm not familiar with your physical switch. However a quick search showed that it supports a maximum MTU of 9216, so if 9000 doesn't work you may try 9216.

André

0 Kudos
Chekan
Enthusiast
Enthusiast

Jumbo frames of my physical switch - DXS-3600-32S are 12288 maximum. I did set both vmkernel and vswitch to 9000 mtu and ports on phsical switch to 12288 - i still get timeout error

2.JPG

0 Kudos
rajen450m
Hot Shot
Hot Shot

Hi Chekan,

Its again a different error now.

Disable ipv6 on the vmnic interfaces and check.

Hope you have already verified the security policies on the both hosts, vswitch/portgroups...

Okay, so check for the network devices connected to ESXi and the interfaces connected to it.

If your business allows, please reboot the network switche (physical switch - DXS-3600-32S) connected to this ESXi hosts.

There can be the issue....!!!! Its definitely on network side...

Raj M Please mark helpful or correct if my answer resolved your issue. Visit www.hypervmwarecloud.com for my blog posts, step-by-step procedures etc.,
0 Kudos
Chekan
Enthusiast
Enthusiast

ipv6 disabled on all hosts, security settings are identical. Switch rebooted. No success.

0 Kudos
Finikiez
Champion
Champion

can you show output

esxcli network nic get -n <vmnic#>

where vmnic# is the port used for vmkernel's uplink?

Do hosts have the same NIC or different?

0 Kudos
Chekan
Enthusiast
Enthusiast

All vmnics are the same AOC-STGN-i2s (Supermicro). Today i replaced switch with Mikrotik SFP+ switch and treid to vmotion. Waht was strange again is that 2 machines (30 and 60 gb sizez) did well. But when i tried bigger one (230gb) on around 55% it failed with error like on the latest screenshot.

Advertised Auto Negotiation: false

   Advertised Link Modes: 10000BaseT/Full

   Auto Negotiation: false

   Cable Type: FIBRE

   Current Message Level: 7

   Driver Info:

         Bus Info: 0000:82:00.1

         Driver: ixgbe

         Firmware Version: 0x800006da

         Version: 3.7.13.7.14iov-NAPI

   Link Detected: true

   Link Status: Up

   Name: vmnic3

   PHYAddress: 0

   Pause Autonegotiate: true

   Pause RX: false

   Pause TX: false

   Supported Ports: FIBRE

   Supports Auto Negotiation: false

   Supports Pause: true

   Supports Wakeon: false

   Transceiver: external

   Virtual Address: 00:50:56:53:9d:eb

   Wakeon: None

0 Kudos
Finikiez
Champion
Champion

When you do vmtion - you copy only VM's RAM.

Or do you mean storage vmotion you tried?

0 Kudos
Finikiez
Champion
Champion

Also try to update ixgbe driver on hosts

Download VMware vSphere

0 Kudos