Having a bit of a strange issue with vMotion today. It's been working in my lab for months, and was still working fine after I upgraded to 5.1 last week.
Now, today, for whatever reason, it just refuses to progress beyond 14%, failing with timeout. Here's the log snippet:
I can vmkping between the relevant interfaces just fine:
# vmkping 10.5.132.61
PING 10.5.132.61 (10.5.132.61): 56 data bytes
64 bytes from 10.5.132.61: icmp_seq=0 ttl=64 time=0.131 ms
64 bytes from 10.5.132.61: icmp_seq=1 ttl=64 time=0.162 ms
64 bytes from 10.5.132.61: icmp_seq=2 ttl=64 time=0.122 ms
--- 10.5.132.61 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.122/0.138/0.162 ms
I've tried restarting vCenter itself, no help. Tried multiple different VMs. Tried various combinations of my 4 hosts, and none of them work.
I just dont know where else to go - Ideas?
Edit: Other things I've confirmed: Time sync is good. Disk free space is good. FWD and REV name resolution is good.
Thank you for this feedback.
1. We have been discussing support for routable vMotion internally and official product feature requests by customers would definetely help to prioritize this feature correctly. http://frankdenneman.nl/2012/11/12/vmware-feature-request/
2. vMotion checks the path, but not as far as you would like. This could increase the overhead substantially as some customers have an extreme network config and we need to take any configuration into account if we promise to check the entire path. Similar to previous point, please submit an feature request if you want to see this on the scope of our product managers. http://frankdenneman.nl/2012/11/12/vmware-feature-request/
3. Multi nic can split up the packets of a vMotion operation of a single VM and will do so to leverage the available bandwidth. The sooner vMotion copies over the "dirty" pages, the sooner the complete state is copied and we avoid the overhead of pages that are dirtied again during the copy process.
I had the same problem in my lab environment after updating one host from esxi5.0 to esxi5.1. I am running a four host cluster using two distributed switches. I performed the update using Update Manager. After the host updated, everything looked good with the exception of vMotion. After doing a little digging, I found that my vMotion virtual adapter's IP address had changed. Updated a second host to 5.1, had no issues with being able to immediately begin vMotioning VMs onto it. Root cause undetermined, but the fix was a simple correction of IP address on the vmk virtual adapter.
Thanks for your reply dmholmes000. My issue turned out to be the ACL on my NFS export.
Had the same issue here. 2 pNICs per host dedicated to iSCSI/vMotion. One pNIC per vSwitch, with iSCSI port binding enabled, vMotion enabled. They attach to dedicated iSCSI switches, on a seperate vLAN and subnet. vMotion failed at 14% and logs show which vmk port they are trying to use. I disabled vMotion on those particular ones and vMotion now works using the other specified ports.
Thank you for your correspondence. I am now out of the office returning on the Monday 25th February. I will reply, if required, as soon as possible on my return.
I've added a second vMotion vmkernel Port on every ESXi Host. The ip address is in the same subnet like the first vMotion vmkernel port. Everthing works fine after that change.
thanks community !!
I recently ran into the same issue but it was isolated to migrotions to and from one particular ESXi 5.1 host After verifying that there were no Ip conflicts or other connectivity issues on the vMotion network i restarted the management services on the host ( services.sh restart ) This instantly resolved the issue and the issue has not returned since.
Would have been nice to know the actual root cause but I was unable to figure that out.
I have similar problem when upgrade Host 5.0 U1 to 5.1.
I have a vSwitch0 in each host and consist of vMotion portgroup with IP Host 1 : 192.168.1 and Host 2 : 192.168.1.2. I also have a management portgroup with IP Host 1 : 10.33.10.16 and IP Host 2 : 10.33.10.18.
When I tried to vMotion between both host, task are failed in 14 % . After that I disable vMotion in vMotion portgroup and enable vMotion in Management portgroup. After that, vMotion is success.
I hope that's can help you.
I just had the same problem after adding a new 5.1 U1 host to our cluster. The vnetwork configs were identical and yet I couldn't vmotion to the new host... It worked for 4 VMs and then nothing... They all failed at 14%.
After reading a few posts on here I asked my networking team to permit all vlans on the physical switch (dedicated cisco switches for vmotion in an C7000 enclosure) and it worked right away. They reconfigured the switches to permit only the correct vlans (same config as when it was not working) and it has been working flawlessly since... :smileyconfused:
Had a similar issue updating to 5.1U1. I'm running a 6 cluster c7000 HP blade chassis. All hosts running 5.0. I used update manager to update one server to 5.1U1. The installation was completed successfully and was able to log into ESX reconnect to vCenter the whole 9. All my network settings transferred over no issues except vMotion failing at 14%. I logged a support ticket with VMware and they told me my vmotion NICs were not able to see the vlan associated to those vmotion NICS, however, he put 1 NIC into standy and vmotion worked...or so I thought. I updated my second host to 5.1U1 and same vmotion issue after upgrading, however, this time placing one nic into standby did not resolve the vmotion error. It was at this point the first host I put on 5.1 u1 starting acting really strange. Console was showing fork errors, couldn't change any settings on vCenter, recconnect, restarting management services didn't help only a hard boot corrected those issues but vmotion still fails at 14%. The strange part is that even tho vMotion failed and anything else I did failed when I rebooted the host the guest VMs did transfer over without a reboot of the guest. I placed another call into VMWare today but i'm really feeling a rollback may be in order to maintain some stability of my systems.
I just had the same problem in my c7000 also... Try to see if you can permit all vlan in your interconnect switches (we have 3020s) and see if it solves the problem.
I've got all the 3020s setup as trunk ports with all vLANs allowed. the vswitch network adapter properties is still only showing three vlans (none are the vmotion network). What's odd is the same switches with the same port setups are working fine on the 5.0 esx hosts. I'm wondering what the big change is with 5.1 or u1.
Thanks for saving me the time. This was my issue.
In my case, it was the upstream switch that didn't pass the vMotion vlan. When a host was using a nic connected to virtual connect 1 and the other host used a nic connected to virtual connect 2, the traffic needed to go through the upstream switch which didn't allow the vlan.
Just encountered the same problem. At least the same symptoms. Mine were caused by an orphaned vmx-**.vswp file. My solution you'll find here:
hope it will help someone.