Having a bit of a strange issue with vMotion today. It's been working in my lab for months, and was still working fine after I upgraded to 5.1 last week.
Now, today, for whatever reason, it just refuses to progress beyond 14%, failing with timeout. Here's the log snippet:
I can vmkping between the relevant interfaces just fine:
# vmkping 10.5.132.61
PING 10.5.132.61 (10.5.132.61): 56 data bytes
64 bytes from 10.5.132.61: icmp_seq=0 ttl=64 time=0.131 ms
64 bytes from 10.5.132.61: icmp_seq=1 ttl=64 time=0.162 ms
64 bytes from 10.5.132.61: icmp_seq=2 ttl=64 time=0.122 ms
--- 10.5.132.61 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.122/0.138/0.162 ms
I've tried restarting vCenter itself, no help. Tried multiple different VMs. Tried various combinations of my 4 hosts, and none of them work.
I just dont know where else to go - Ideas?
Edit: Other things I've confirmed: Time sync is good. Disk free space is good. FWD and REV name resolution is good.
Has anything changed other than the 5.1 upgrade a few weeks back?
It seems like most of your work has centered around the vSphere equation. How much digging have you done on your network, such as looking at the physical switches (perhaps in debug mode) to find any issues? Any new gear plugged in, IP conflicts, etc.?
Well, with a reboot of all 4 hosts, everything has cleared up.
I dont like this answer, but such is life
I have the same problem after upgrade to 5.1
Any solution for this? I have a 5.1 host that also fails on vmotions at 14%. Rebooting all my hosts didn't resolve it.
I was able to get this resolved. In my case, the problematic host didn't have proper access to the NFS exports. I use Netapps VSC to connect the storage and it only assigned r/w access instead of root on this one host.
I was having the same issue when I upgraded one of the hosts from ESXi 5.0up1 to 5.1, I originally had the vmotion network only setup with one vmkernel port, I added an extra vmkernel port to each host as per the guide
Initially I added an IP address within the same range to the new vmkernel port eg host 1 vmotion IP1 = 172.19.19.10 IP2 = 172.19.19.11 host 2 vmotion IP1 172.19.19.12 IP2 172.19.19.13
Vmotion was still failing at 14%
I changed the second vmkernel port to use a different subnet
Host 1 vmotion IP1 = 172.19.19.10 IP2 = 172.19.20.10
Host 2 vmotion IP1 = 172.19.19.12 IP2 = 172.19.20.12
After the new subnet was added vmotion started working again.
Same situation here This is the second "major" error since the recent upgrade from 5.0 to 5.1. First one was solved in this discussion.
In my enviroment I have an openfiler and my 2 ESXi hosts are nested on the same host.
If I tail -f the messages.log in the openfiler I find this. I will try to get to the bottom of it but if anyone knows where I should start looking, it will be very much appreciated:
The log goes like this while the migration is stucked at 14%.
kern.info<6>: Nov 20 16:18:08 openfiler kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun1 by sid:84552445757952 (Unknown LUN)
Then it justs repeats over and over again these messages (right after the first one)
kern.info<6>: Nov 20 16:18:10openfiler kernel: last message repeated 26 times
So from my ignorance, I'm guessing that for some reason after the upgrade, the path is pointing to a place that is not right...but, as stated before, I'll try to find out exactly what's happening...but PLEASE any help is more than welcome!!!
similar problem here with multiple vmotion networks.
I have 3 hosts on BL460c Gen 8 inside one HP C7000 shelf with VirtualConnect Flexfabric and ethernet networks of VirtualConnect are configured in Vlan tunneling.
vmk0 is used for management traffic while vmk1 to vmk4 should be used for vmotion traffic
vmk0,1,2 belong to first vDS (uplink vmnic0 and vmnic1) while vmk3,4 belong to second vDS (uplink vmnic2 and vmnic3)
i get 14% vmotion fail error if i enable more than one vmknic for vmotion while if i keep one vmknic for vmotion it works perfectly no matter which vmknic i chose (of course the same vmknik on the 3 hypervisors)
I found this http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203748... but i don't think it is my case since the problem should arise even with only one vmknic and not only with more than one.
My ESXi version is VMware ESXi 5.1.0 build-799733.
Just to complete scenario when i get the error with multiple vmotion networks (all of them are tagged on different vlans) it seems that host1 tries to contact host2 for example from 10.2.106.0 network to 192.168.12.0 while it should try the connection on 10.102.106.0.
Does anyone have suggestions?
Best regards, Roberto Traversi.
I've got the same problème. I've just upgrade one host from 5.0up1 to 5.1 and i can't vmotion.
I've tried to delete and reconfigure vmkernel port, it works fine, but if i reboot the host, the problem appair again.
Hello, i confirm, same problem as you, suddenly and without further action (just flagged or deflagged vmotion in vmk nics) everything worked fine, if i reboot the Hypervisor the problem comes out again, moreover the vmotion fails towards rebooted Hypervisor while works fine among other hypervisors not rebooted.
I opened a support case, i'll keep you posted with updates.
Best regards, Roberto.
I've tried to put this host out of the cluster and get it back again and VMotion works.
I'm installing this patch too: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=203454...
PS: the patch didn't work for me, i revert back to 5.0up1
Probably totally unrelated but I got the same error at 14%. After a bit of digging turns out I had duplicate IPs on our vMotion VLAN. It may help some of you.
Multiple vmkernel interfaces with vmotion enabled will cause this issue. I disabled vmotion on and all interfaces but one and confirmed it worked.
In our environment just leaving one vmotion nic (we tried all one by one) solves the issue so it is not related to a duplicate ip address.
Finally i succeded in uploading log file to VMWare support i hope to have an answer from them.
just a quick update, i had answer from VMWare support, multiple nic vmotion is supported only on the same subnet and the same vlan (if vlan tagging is used). Honestly reading documentation i didn't undestood that, i'll have to review it more carefully.
Best regards, Roberto.
For what I see in this discussion and other places on the web. Fails at 14% are solved in different ways so it must be caused from several errors. Most of them from network or vnetwork configurations.
The solutions quoted in this discussion didn't work for my case but most likely they will work for others.
The good thing is, that if that other community members keep on feeding this discussion, we will all have a pretty good document about vMotions failing at 14% and ways to solve that error!
It would be also good, to completly understand vmotion mechanism in order to know what vMotion does up until 14% and what it tries to do after 14%...hopefully someone out there knows the answer and post it here
Thank you all for your answers and for making this community such a great tool!!!
I have written two articles about the correct configuration of the vMotion network, it might be helpfull to check your network configuration against it:
http://frankdenneman.nl/vmotion/2879/ (Designing your vMotion network)
http://frankdenneman.nl/vmotion/multi-nic-vmotion-failover-order-configuration/ (Multi-NIC vMotion – failover order configuration)
It's highly unlikely that we are going to publically intimate details about the vMotion process, I would rather see everybody file an SR if they experience this problem as this will provide us a lot of feedback to enhance and improve our vMotion code. After GSS provided the answer to them, they can share this answer with the communitity on this board.
I've got a similar issue with my lab setup. I started a discussion and noticed you guys have this one. http://communities.vmware.com/message/2174867#2174867
I've tried a few things myself and was wondering if anyone can do a storage vmotion whilst they have 14% vmotion issue?
i read your article and i understand that i imagined the multi-nic vmotion differently.
Although vmotion in ESXi 5 works better than in 4 i think that some improvements should be considered (i'll try to ask them as feature request hoping they could be taken into consideration)
Best regards, Roberto Traversi.