I am getting this timeout error in vcenter when doing a vmotion of a vm. any idea what this is?
this is on esxi 5.1 914609
Hi,
Looks like its matching with KB:http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=103026... do confirm the status.
I checked on all of them. Even changed vmk ip addesses. but still same problem. tried vmkping/ping each hosts vmk ips and works. Its random. sometimes vmotion works but most of the time it doesnot and times out.
Tried it again and now it gives this
Failed to initialize migration at source. Error 195887109. Already exists.
vMotion migration Re: Migration to host x.x.x.x failed with error Already disconnected (195887150). The requested scheduler group already exists. Failing migration. A previously attempted migration may still be cleaning up state. Please retry in a few minutes.
The vMotion failed because the destination host did not receive data from the source host on the vMotion network. Please check your vMotion network settings and physical network configuration and ensure they are correct.
Hi,
Can you please post the vmkernel/hostd messages when the vmotion failed.
here are the vmotion logs from the source when it fails
2013-01-20T15:00:37.200Z cpu12:9141)Config: 347: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)
2013-01-20T15:00:37.917Z cpu6:44048)Migrate: vm 44049: 3273: Setting VMOTION info: Source ts = 1358694037317522, src ip = <10.60.1.249> dest ip = <10.60.1.213> Dest wid = 42592 using SHARED swap
2013-01-20T15:00:37.920Z cpu4:44242)MigrateNet: 1165: 1358694037317522 S: Successfully bound connection to vmknic '10.60.1.249'
2013-01-20T15:00:37.920Z cpu6:44048)Tcpip_Vmk: 1213: Affinitizing 10.60.1.249 to world 44242, Success
2013-01-20T15:00:37.920Z cpu6:44048)VMotion: 2688: 1358694037317522 S: Set ip address '10.60.1.249' worldlet affinity to send World ID 44242
2013-01-20T15:00:37.920Z cpu6:44048)Hbr: 3308: Migration start received (worldID=44049) (migrateType=1) (event=0) (isSource=1) (sharedConfig=1)
2013-01-20T15:00:37.920Z cpu14:8796)MigrateNet: vm 8796: 2061: Accepted connection from <10.60.1.213>
2013-01-20T15:00:37.920Z cpu14:8796)MigrateNet: vm 8796: 2131: dataSocket 0x4100264a3a00 receive buffer size is 563724
2013-01-20T15:00:37.921Z cpu4:44242)MigrateNet: 1165: 1358694037317522 S: Successfully bound connection to vmknic '10.60.1.249'
2013-01-20T15:00:37.921Z cpu4:44242)VMotionUtil: 3184: 1358694037317522 S: Stream connection 1 added.
2013-01-20T15:00:37.921Z cpu4:44242)MigrateNet: 1165: 1358694037317522 S: Successfully bound connection to vmknic '10.60.1.250'
2013-01-20T15:00:37.921Z cpu4:44242)VMotionUtil: 3184: 1358694037317522 S: Stream connection 2 added.
2013-01-20T15:00:57.942Z cpu17:44240)WARNING: VMotionUtil: 2098: 1358694037317522 S: Stream write completion 0x4125b00133d0 inactive for at least 20 seconds! Progressed to 60392/527876 bytes.
2013-01-20T15:00:57.942Z cpu17:44240)VMotionUtil: 5119: 1358694037317522 S: Socket 0x410026469400 streamHealth pending: 0/563724 snd 144 rcv
2013-01-20T15:00:57.942Z cpu19:44241)WARNING: VMotionUtil: 2098: 1358694037317522 S: Stream write completion 0x4125b00131b0 inactive for at least 20 seconds! Progressed to 35816/527876 bytes.
2013-01-20T15:00:57.942Z cpu19:44241)VMotionUtil: 5119: 1358694037317522 S: Socket 0x41001b835a00 streamHealth pending: 0/563724 snd 144 rcv
2013-01-20T15:01:17.963Z cpu19:44241)WARNING: VMotionUtil: 2098: 1358694037317522 S: Stream write completion 0x4125b00133d0 inactive for at least 20 seconds! Progressed to 60392/527876 bytes.
2013-01-20T15:01:17.963Z cpu19:44241)VMotionUtil: 5119: 1358694037317522 S: Socket 0x410026469400 streamHealth pending: 0/563724 snd 144 rcv
2013-01-20T15:01:17.963Z cpu17:44240)WARNING: VMotionUtil: 2098: 1358694037317522 S: Stream write completion 0x4125b00131b0 inactive for at least 20 seconds! Progressed to 35816/527876 bytes.
2013-01-20T15:01:17.963Z cpu17:44240)VMotionUtil: 5119: 1358694037317522 S: Socket 0x41001b835a00 streamHealth pending: 0/563724 snd 144 rcv
2013-01-20T15:01:37.983Z cpu19:44241)WARNING: VMotionUtil: 2098: 1358694037317522 S: Stream write completion 0x4125b00131b0 inactive for at least 20 seconds! Progressed to 35816/527876 bytes.
2013-01-20T15:01:37.983Z cpu19:44241)VMotionUtil: 5119: 1358694037317522 S: Socket 0x41001b835a00 streamHealth pending: 0/563724 snd 144 rcv
2013-01-20T15:01:37.983Z cpu17:44240)WARNING: VMotionUtil: 2098: 1358694037317522 S: Stream write completion 0x4125b00133d0 inactive for at least 20 seconds! Progressed to 60392/527876 bytes.
2013-01-20T15:01:37.983Z cpu17:44240)VMotionUtil: 5119: 1358694037317522 S: Socket 0x410026469400 streamHealth pending: 0/563724 snd 144 rcv
2013-01-20T15:01:58.004Z cpu17:44240)WARNING: VMotionUtil: 2098: 1358694037317522 S: Stream write completion 0x4125b00133d0 inactive for at least 20 seconds! Progressed to 60392/527876 bytes.
2013-01-20T15:01:58.004Z cpu17:44240)VMotionUtil: 5119: 1358694037317522 S: Socket 0x410026469400 streamHealth pending: 0/563724 snd 144 rcv
2013-01-20T15:01:58.004Z cpu19:44241)WARNING: VMotionUtil: 2098: 1358694037317522 S: Stream write completion 0x4125b00131b0 inactive for at least 20 seconds! Progressed to 35816/527876 bytes.
2013-01-20T15:01:58.004Z cpu19:44241)VMotionUtil: 5119: 1358694037317522 S: Socket 0x41001b835a00 streamHealth pending: 0/563724 snd 144 rcv
2013-01-20T15:02:01.346Z cpu19:44241)WARNING: VMotionUtil: 4526: 1358694037317522 S: socket connected returned: Already disconnected
2013-01-20T15:02:01.346Z cpu19:44241)WARNING: VMotionUtil: 6990: 1358694037317522 S: failed to flush stream buffer: Already disconnected
2013-01-20T15:02:01.346Z cpu17:44240)WARNING: VMotionUtil: 4526: 1358694037317522 S: socket connected returned: Already disconnected
2013-01-20T15:02:01.346Z cpu19:44241)WARNING: VMotionUtil: 1617: 1358694037317522 S: failed writing stream completion: Already disconnected
2013-01-20T15:02:01.346Z cpu17:44240)WARNING: VMotionUtil: 6990: 1358694037317522 S: failed to flush stream buffer: Already disconnected
2013-01-20T15:02:01.346Z cpu17:44240)WARNING: VMotionUtil: 1617: 1358694037317522 S: failed writing stream completion: Already disconnected
2013-01-20T15:02:01.346Z cpu19:44241)WARNING: Migrate: 269: 1358694037317522 S: Failed: Already disconnected (0xbad002e) @0x0
2013-01-20T15:02:01.346Z cpu19:44241)VMotionUtil: 5119: 1358694037317522 S: Socket 0x410026469400 streamSock pending: 563724/563724 snd 144 rcv
2013-01-20T15:02:01.346Z cpu19:44241)VMotionUtil: 5119: 1358694037317522 S: Socket 0x41001b835a00 streamSock pending: 563724/563724 snd 144 rcv
2013-01-20T15:02:01.422Z cpu0:44059)WARNING: Migrate: 5244: 1358694037317522 S: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.
2013-01-20T15:02:01.422Z cpu0:44059)Hbr: 3405: Migration end received (worldID=44049) (migrateType=1) (event=1) (isSource=1) (sharedConfig=1)
2013-01-20T15:02:01.422Z cpu12:44243)VMotionUtil: 5119: 1358694037317522 S: Socket 0x4100264a3a00 rcvMigFree pending: 31424/35792 snd 0 rcv
2013-01-20T15:02:01.455Z cpu2:9138)Config: 347: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)
vmkernel logs on the destination vmotion host
2013-01-20T15:10:23.547Z cpu14:9160)Config: 347: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)
2013-01-20T15:10:23.613Z cpu0:43396)MemSched: vm 43396: 7756: extended swap to 8192 pgs
2013-01-20T15:10:23.766Z cpu0:43396)World: vm 43397: 1421: Starting world vmm0:ucstest with flags 8
2013-01-20T15:10:23.766Z cpu0:43396)Sched: vm 43397: 6416: Adding world 'vmm0:ucstest', group 'host/user', cpu: shares=-3 min=0 minLimit=-1 max=-1, mem: shares=-3 min=0 minLimit=-1 max=-1
2013-01-20T15:10:23.766Z cpu0:43396)Sched: vm 43397: 6431: renamed group 63064 to vm.43396
2013-01-20T15:10:23.766Z cpu0:43396)Sched: vm 43397: 6448: group 63064 is located under group 4
2013-01-20T15:10:23.772Z cpu0:43396)MemSched: vm 43396: 7756: extended swap to 27336 pgs
2013-01-20T15:10:23.808Z cpu1:43396)Migrate: vm 43397: 3273: Setting VMOTION info: Dest ts = 1358694623315922, src ip = <10.60.1.249> dest ip = <10.60.1.213> Dest wid = 0 using SHARED swap
2013-01-20T15:10:23.812Z cpu1:43396)Tcpip_Vmk: 1213: Affinitizing 10.60.1.213 to world 43403, Success
2013-01-20T15:10:23.812Z cpu1:43396)VMotion: 2688: 1358694623315922 😧 Set ip address '10.60.1.213' worldlet affinity to recv World ID 43403
2013-01-20T15:10:23.812Z cpu1:43396)Hbr: 3308: Migration start received (worldID=43397) (migrateType=1) (event=0) (isSource=0) (sharedConfig=1)
2013-01-20T15:10:23.931Z cpu6:8796)MigrateNet: vm 8796: 2061: Accepted connection from <10.60.1.249>
2013-01-20T15:10:23.931Z cpu6:8796)MigrateNet: vm 8796: 2131: dataSocket 0x41001b8bc810 receive buffer size is 563724
2013-01-20T15:10:23.931Z cpu9:43402)MigrateNet: 1165: 1358694623315922 😧 Successfully bound connection to vmknic '10.60.1.213'
2013-01-20T15:10:23.931Z cpu6:8796)MigrateNet: vm 8796: 2061: Accepted connection from <10.60.1.249>
2013-01-20T15:10:23.931Z cpu6:8796)MigrateNet: vm 8796: 2131: dataSocket 0x41002709c810 receive buffer size is 563724
2013-01-20T15:10:23.931Z cpu6:8796)VMotionUtil: 3184: 1358694623315922 😧 Stream connection 1 added.
2013-01-20T15:10:23.931Z cpu6:8796)MigrateNet: vm 8796: 2061: Accepted connection from <10.60.1.250>
2013-01-20T15:10:23.931Z cpu6:8796)MigrateNet: vm 8796: 2131: dataSocket 0x41001bac1e10 receive buffer size is 563724
2013-01-20T15:10:23.932Z cpu6:8796)VMotionUtil: 3184: 1358694623315922 😧 Stream connection 2 added.
2013-01-20T15:10:43.934Z cpu9:43402)WARNING: Migrate: 269: 1358694623315922 😧 Failed: Timeout (0xbad0021) @0x41802eabde57
2013-01-20T15:10:43.934Z cpu9:43402)VMotionSend: 3978: 1358694623315922 😧 Waiting for channel references prior to cleanup...
2013-01-20T15:10:43.935Z cpu18:43396)WARNING: Migrate: 5244: 1358694623315922 😧 Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.
2013-01-20T15:10:43.935Z cpu18:43396)Hbr: 3405: Migration end received (worldID=43397) (migrateType=1) (event=1) (isSource=0) (sharedConfig=1)
2013-01-20T15:10:43.935Z cpu0:43401)VMotionUtil: 5119: 1358694623315922 😧 Socket 0x4100270b7a10 sendSocket pending: 563644/563724 snd 0 rcv
2013-01-20T15:10:43.957Z cpu12:20650)Config: 347: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)
Hi,
We need to check Virtual machine logs(Vmware) to figure out the exact error message>"Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error".Can you please post the virtual machine logs.
1.Is this the only Virtual machine where in vmotion is failing?
its happening for all vms. not just one. If I waited a bit, then vmotion might go through. Usually a reboot as well will work. but it will fails after a while
Hi,
Thanks for the update.Can you please upload the virtual machine logs(Failed Vmotion).I guess vmotion was working earlier?
Are you by any chance using a Qlogic 81xx CNA Adapter in this configuration?
I'm gettting the same type of errors that you are on a system that was working fine before I upgraded it from ESXi 4.1 to ESXi 5.1 recently.
Scott
It this point, I've got an open ticket with VMware trying to identify the issue.
We are currently focused on trying to rule in/rule out this KB issue: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=204287...
My VMware support technician is in the process of trying to define what the net packet pools size should be set to in this scenario.
That is all I know at this point. Scott
have you checked all your mtu settings to make sure they match - vmk, vswitch, physical port and physical switches?
Oh Yeah. We've been through the servers with a fine tooth comb.... It is definitely not an obvious problem. I think it is a driver issue, but it could be a throughput/buffer issue as well.
anyone had this problem solved? I have a support ticket open for 40 days but so far nothing.
I can not upgrade the environment to version 5.1 with this problem!
what hardware you running this on? have you tried vmkping -s -s 9000 to the storage?
are you using jumbo frames?
The hosts are running on a PowerEdge R710 with a Qlogic 8152. And we are using jumbo frames.
Access to the storage is OK. The vMotion and Storage Networs are in different subnet's.
and vmkping to the storage is ok?
vmkping -s 9000 storage_ip
Yes. It's OK.
how are you vmkernel interfaces configured? how many nics for vmotion? is vmotion and iscsi vlan separated?