HI
i have upgraded only one host from esxi5.1 to esxi5.5 u 3 in the cluster.but vmotion struct at 14%. from esxi5.1 to esxi5.5.
what could be the reason for this issue.
Please provide some more details.
Does this happen when you vMotion from v5.5 to v5.5, or v5.1 to v5.5, ...?
Are there any additional details about the failed vMotion process in the tasks and events view?
André
same issue I faced. until now no resolution
Hello,
Can you post a snippet of the vmkernel.log during vMotion failure?
The vMotion VMkernel.log should start something like below:
2015-12-25T16:39:25.565Z cpu4:2758489)Migrate: vm 2758492: 3284: Setting VMOTION info: Source ts = 1451061663105920, src ip = <192.168.1.176> dest ip = <192.168.1.177> Dest wid = 1830931 using SHARED swap
When you tail -f vmkernel.log and perform vMotion and when it errors or fails at 14 percent, what is the logging that we see?
Also is vMotion failing to one specific host? Or from one specific host?
Suhas
I found something that might be worth trying. Change all the vMotion IP addresses (you can set them to their original value later on), and make sure vMotion is checked for the vMotion VMKernel port group(s). See my reply at Re: Error when trying to vMotion: failed with error msg.vmk.status.VMK_IS_DISCONNECTED (195887150)
André
VMware KB: Unable to power ON and migrate the virtual machine after upgrade to ESXi 5.5 from ESXi 5.... might be related to this.
André
Hi Maniee,
Please check network configuration. are u using vss or vds?
- Check MTU it should be the same configuration in vmotion port group.
- Check your vlan if settle correctly and check as well vlan traffic.
- Check ip address if configure correctly
- How many host encounter this issue?
- Destination host to Source host issue will happen right? can you try to migrate one vm from Source host to Destination host, if you are able to successfully migrate that VM try to reboot problematic Host faced that issue
- Lastly collect logs and let us see if we can able to help you analyzing it
Hi,
We have the same issue, we are upgrading our system from 5.x to 5.5u3b
So far I could reproduce the issue on 3 different cluster.
Details:
Cluster #1 and cluster #2 - contains 4 node.
3 node have been upgraded to 5.5u3b - build 3343343
4th node is still on 5.1.0 build 1483097
The cluster is using VSS on management and VDS for the VM servers.
vswitch0 - management with 2 network card, explicit failover, one active one standby adapter
vds - for the VM hosts - with 2 network card
On cluster #1, and #2 - we have 1-1 VM server which I can not migrate to the upgraded hosts.
I can migrate VM servers to the old node, and then migrate it back. The problem is specific to the VM server.
The VM server which is affected is a Win 2008R2, VM tool version is on the VM server is: 9.0.5 build 1065307
vm hardware is v9
-----------------------------------------
Cluster #3 have 5 nodes
3 node have been upgraded to 5.5u3b - build 3343343
node 4, 5 is still on 5.0.0 build 623860
The cluster is using VSS on both for management and for the VM servers.
There is two vswitch - one for managemnet one for the VM servers - each with 2 network card, explicit failover, one active one standby adapter
Cluster #3
Here we stil have 2 node on 5.0 The affected VM can be migrated freely between the two 5.0.0 host, but it can not be migrated to any of the upgraded 5.5 hosts.
The VM server which is affected is a Win 2008R2, VM tool version is on the VM server is: 8.6.5 build 621624
vm hardware is v8
-----------------------
MTU size on al system is 1500
On the switches jumbo frame is enabled. That means the MTU is 9216. So this should not be the issue.
On network side the servers are OK, we are using them for years, there was no such issue.
Error during the migration is:
Migration to host xx.XX.XX.XX failed with error Already disconnected (195887150)
vmotion migration [-1062716352:1453282157653493] failed writing stram completion: Already disconnected
vmotion migration [-1062716352:1453282157653493] failed to flush stream buffer: Already disconnected
vmotion migration [-1062716352:1453282157653493] socket connected returned: Already disconnected
This comes up at 14%
Sniplet from two of the affected VM server vmware.log file - This is the log what the system created during the vmotion:
2016-01-19T13:39:44.832Z| vmx| I120: MigratePlatformInitMigration: DiskOp file set to /vmfs/volumes/51b9ba04-be38ab49-ddfc-d89d67136f18/FRJLOLA01/FRJLOLA01-diskOp.tmp
2016-01-19T13:39:44.852Z| vmx| I120: MigrateWaitForData: waiting for data.
2016-01-19T13:39:44.852Z| vmx| I120: MigrateSetState: Transitioning from state 7 to 8.
2016-01-19T13:39:44.891Z| vmx| I120: MigrateRPC_RetrieveMessages: Informed of a new user message, but can't handle messages in state 4. Leaving the message queued.
2016-01-19T13:39:44.998Z| vmx| I120: MigrateSetStateFinished: type=2 new state=11
2016-01-19T13:39:44.998Z| vmx| I120: MigrateSetState: Transitioning from state 8 to 11.
2016-01-19T13:39:44.998Z| vmx| I120: [msg.migrate.waitdata.platform] Failed waiting for data. Error bad0007. Bad parameter.
2016-01-19T13:39:44.998Z| vmx| I120: Migrate: cleaning up migration state.
2016-01-19T13:39:44.999Z| vmx| I120: MigrateSetState: Transitioning from state 11 to 0.
2016-01-19T13:39:44.999Z| vmx| I120: Migrate: Final status reported to VMDB.
2016-01-19T13:39:44.999Z| vmx| I120: Module Migrate power on failed.
2016-01-19T13:39:44.999Z| vmx| I120: VMX_PowerOn: ModuleTable_PowerOn = 0
2016-01-19T13:39:44.999Z| vmx| I120: SVMotion_PowerOff: Not running Storage vMotion. Nothing to do
2016-01-19T13:39:45.000Z| vmx| I120: WORKER: asyncOps=1 maxActiveOps=1 maxPending=0 maxCompleted=0
2016-01-19T13:39:45.000Z| vmx| I120: Vix: [275507 mainDispatch.c:1201]: VMAutomationPowerOff: Powering off.
2016-01-19T13:39:45.001Z| vmx| W110: /vmfs/volumes/51b9ba04-be38ab49-ddfc-d89d67136f18/FRJLOLA01/FRJLOLA01.vmx: Cannot remove symlink /var/run/vmware/root_0/1453210780080218_275507/configFile: No such file or directory
2016-01-19T13:39:45.055Z| vmx| I120: Vix: [275507 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0
2016-01-19T13:39:45.056Z| vmx| I120: Msg_Post: Error
2016-01-19T13:39:45.056Z| vmx| I120: [vob.swap.migrate.invalidindex.mig] The migration swap type is not supported for migration.
2016-01-19T13:39:45.056Z| vmx| I120: [vob.migrate.addpage.swapped.invalidindex] Received invalid swap slot data (0xc000ec8c) for pgNum 0.
2016-01-19T13:39:45.056Z| vmx| I120: [vob.vmotion.addpage.failed.status] vMotion migration [c0a83c40:1453210779366300] failed to add memory page 0 to VM: Bad parameter
2016-01-19T13:39:45.056Z| vmx| I120: [vob.vmotion.stream.completion.complete.fail] vMotion migration [c0a83c40:1453210779366300] failed draining stream completion: Bad parameter
2016-01-19T13:39:45.056Z| vmx| I120: [msg.moduletable.powerOnFailed] Module Migrate power on failed.
2016-01-19T13:39:45.056Z| vmx| I120: [msg.vmx.poweron.failed] Failed to start the virtual machine.
2016-01-19T13:39:45.056Z| vmx| I120: ----------------------------------------
2016-01-19T13:39:45.058Z| vmx| I120: VmdbPipeStreamsOvlError Couldn't read: OVL_STATUS_EOF, (2) No such file or directory.
2016-01-19T13:39:45.058Z| vmx| I120: VmdbCnxDisconnect: Disconnect: closed pipe for pub cnx '/db/connection/#1/' (-32)
2016-01-19T13:39:45.059Z| vmx| I120: VmdbDbRemoveCnx: Removing Cnx from Db for '/db/connection/#1/'
2016-01-19T13:39:45.059Z| vmx| I120: VUIDialogDequeue: found 0 vmdb ui connections; canceling dialogs
2016-01-19T13:39:45.059Z| vmx| I120: Vix: [275507 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
2016-01-19T13:39:45.059Z| vmx| I120: Transitioned vmx/execState/val to poweredOff
2016-01-19T13:39:45.059Z| vmx| I120: Vix: [275507 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0
2016-01-19T13:39:45.059Z| vmx| I120: Vix: [275507 mainDispatch.c:4003]: Error VIX_E_FAIL in VMAutomation_ReportPowerOpFinished(): Unknown error
2016-01-19T13:39:45.066Z| vmx| I120: Vix: [275507 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
2016-01-19T13:39:45.066Z| vmx| I120: Transitioned vmx/execState/val to poweredOff
2016-01-19T13:39:45.067Z| vmx| I120: VMIOP: Exit
2016-01-19T13:39:45.069Z| vmx| I120: Vix: [275507 mainDispatch.c:849]: VMAutomation_LateShutdown()
2016-01-19T13:39:45.069Z| vmx| I120: Vix: [275507 mainDispatch.c:799]: VMAutomationCloseListenerSocket. Closing listener socket.
2016-01-20T08:27:44.719Z| vmx| I120: WORKER: Creating new group with numThreads=1 (4)
2016-01-20T08:27:44.720Z| vmx| I120: FTCpt: (0 unk) State transition 0 -> 1
2016-01-20T08:27:44.736Z| vmx| I120: WORKER: Creating new group with numThreads=1 (4)
2016-01-20T08:27:44.736Z| vmx| I120: MigratePlatformInitMigration: DiskOp file set to /vmfs/volumes/54dcb18a-c0a1ec7e-b76e-d89d671472c4/FRHUOPAP01/FRHUOPAP01-diskOp.tmp
2016-01-20T08:27:44.758Z| vmx| I120: MigrateWaitForData: waiting for data.
2016-01-20T08:27:44.758Z| vmx| I120: MigrateSetState: Transitioning from state 7 to 8.
2016-01-20T08:27:46.112Z| vmx| I120: MigrateSetStateFinished: type=2 new state=11
2016-01-20T08:27:46.112Z| vmx| I120: MigrateSetState: Transitioning from state 8 to 11.
2016-01-20T08:27:46.112Z| vmx| I120: [msg.migrate.waitdata.platform] Failed waiting for data. Error bad0007. Bad parameter.
2016-01-20T08:27:46.113Z| vmx| I120: Migrate: cleaning up migration state.
2016-01-20T08:27:46.113Z| vmx| I120: MigrateSetState: Transitioning from state 11 to 0.
2016-01-20T08:27:46.114Z| vmx| I120: Migrate: Final status reported to VMDB.
2016-01-20T08:27:46.114Z| vmx| I120: Module Migrate power on failed.
2016-01-20T08:27:46.114Z| vmx| I120: VMX_PowerOn: ModuleTable_PowerOn = 0
2016-01-20T08:27:46.114Z| vmx| I120: SVMotion_PowerOff: Not running Storage vMotion. Nothing to do
2016-01-20T08:27:46.114Z| vmx| I120: WORKER: asyncOps=0 maxActiveOps=0 maxPending=0 maxCompleted=0
2016-01-20T08:27:46.114Z| vmx| I120: Vix: [240094 mainDispatch.c:1201]: VMAutomationPowerOff: Powering off.
2016-01-20T08:27:46.116Z| vmx| W110: /vmfs/volumes/54dcb18a-c0a1ec7e-b76e-d89d671472c4/FRHUOPAP01/FRHUOPAP01.vmx: Cannot remove symlink /var/run/vmware/root_0/1453278463967538_240094/configFile: No such file or directory
2016-01-20T08:27:46.147Z| vmx| I120: Vix: [240094 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0
2016-01-20T08:27:46.148Z| vmx| I120: Msg_Post: Error
2016-01-20T08:27:46.148Z| vmx| I120: [vob.swap.migrate.invalidindex.mig] The migration swap type is not supported for migration.
2016-01-20T08:27:46.148Z| vmx| I120: [vob.migrate.addpage.swapped.invalidindex] Received invalid swap slot data (0xc0029591) for pgNum 0x1ca1.
2016-01-20T08:27:46.148Z| vmx| I120: [vob.vmotion.addpage.failed.status] vMotion migration [c0a8dce0:1453278463248266] failed to add memory page 0x1ca1 to VM: Bad parameter
2016-01-20T08:27:46.148Z| vmx| I120: [vob.vmotion.stream.completion.complete.fail] vMotion migration [c0a8dce0:1453278463248266] failed draining stream completion: Bad parameter
2016-01-20T08:27:46.148Z| vmx| I120: [msg.moduletable.powerOnFailed] Module Migrate power on failed.
2016-01-20T08:27:46.148Z| vmx| I120: [msg.vmx.poweron.failed] Failed to start the virtual machine.
2016-01-20T08:27:46.148Z| vmx| I120: ----------------------------------------
I tried to change on the 5.5 host the
VMkernel.boot.netPktHeapMaxMBperGB from 6 to 12
and
VMkernel.boot.netPktPoolMaxMBperGB from 75 to 100
on the 5.1 host change the
VMkernel.Boot.netPktPoolMaxSize from 656 to 800.
VMkernel.Boot.netPktHeapMaxSize from 64 to 128
Tried to do an IP address change on the vmkernel interface.
No change. Issue remains.
You need Jumbo frames end-to-end on the whole stack, not just on the switches. It needs to be enabled on the vmnics as well. As far as I know it really only comes into play with iSCSI connections unless someone knows otherwise and can correct me.
Similar issue here. Upgraded one host in a cluster from 5.1U1 to 5.5U3. Cannot VMotion a subset of VMs from 5.1 hosts to the recently upgraded 5.5U3b host with the "Already Disconnected" errors at the 14% mark. The network settings are unchanged from before the upgrade and MTU is set at default 1500.
On a trial set of 10 VMs, 7 were able to VMotion over but 3 would not (the same 3 will continuously fail on multiple attempts).. If I power off the VM and start it back up, it is then able to VMotion around. However, this is not desirable solution for all the affected VMs across clusters.
Log snippets below from the source and destination hosts as well as VMX log.
SOURCE HOST vmkernel:
2016-01-19T17:28:12.164Z cpu0:9209)Config: 347: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)
2016-01-19T17:28:13.001Z cpu16:707103)Migrate: vm 707104: 3273: Setting VMOTION info: Source ts = 1453224491277421, src ip = <10.50.60.47> dest ip = <10.50.60.40> Dest wid = 1542062 using SHARED swap
2016-01-19T17:28:13.005Z cpu16:707103)Tcpip_Vmk: 1213: Affinitizing 10.55.66.47 to world 725253, Success
2016-01-19T17:28:13.005Z cpu16:707103)VMotion: 2688: 1453224491277421 S: Set ip address '10.55.66.47' worldlet affinity to send World ID 725253
2016-01-19T17:28:13.005Z cpu16:707103)Hbr: 3308: Migration start received (worldID=707104) (migrateType=1) (event=0) (isSource=1) (sharedConfig=1)
2016-01-19T17:28:13.005Z cpu22:8816)MigrateNet: vm 8816: 2077: Accepted connection from <10.55.66.40>
2016-01-19T17:28:13.005Z cpu22:8816)MigrateNet: vm 8816: 2147: dataSocket 0x4100172f77d0 receive buffer size is 563272
2016-01-19T17:28:13.005Z cpu29:725253)VMotionUtil: 3184: 1453224491277421 S: Stream connection 1 added.
2016-01-19T17:28:13.005Z cpu29:725253)VMotionUtil: 3184: 1453224491277421 S: Stream connection 2 added.
2016-01-19T17:28:13.010Z cpu6:725252)WARNING: VMotionUtil: 4526: 1453224491277421 S: socket connected returned: Already disconnected
2016-01-19T17:28:13.010Z cpu6:725252)WARNING: VMotionUtil: 6990: 1453224491277421 S: failed to flush stream buffer: Already disconnected
2016-01-19T17:28:13.010Z cpu6:725252)WARNING: VMotionUtil: 1617: 1453224491277421 S: failed writing stream completion: Already disconnected
2016-01-19T17:28:13.010Z cpu6:725252)WARNING: Migrate: 269: 1453224491277421 S: Failed: Already disconnected (0xbad002e) @0x0
2016-01-19T17:28:13.011Z cpu14:725251)WARNING: VMotionUtil: 4526: 1453224491277421 S: socket connected returned: Already disconnected
2016-01-19T17:28:13.011Z cpu14:725251)WARNING: VMotionUtil: 6990: 1453224491277421 S: failed to flush stream buffer: Already disconnected
2016-01-19T17:28:13.011Z cpu14:725251)WARNING: VMotionUtil: 1617: 1453224491277421 S: failed writing stream completion: Already disconnected
2016-01-19T17:28:13.048Z cpu20:707277)WARNING: Migrate: 5244: 1453224491277421 S: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.
2016-01-19T17:28:13.048Z cpu20:707277)Hbr: 3405: Migration end received (worldID=707104) (migrateType=1) (event=1) (isSource=1) (sharedConfig=1)
2016-01-19T17:28:13.119Z cpu9:10858)Config: 347: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)
DESTINATION HOST vmkernel:
2016-01-19T17:28:11.900Z cpu0:1537840 opID=6a039df0)World: 14369: VC opID C1B95B14-00000717-8c-e8-88 maps to vmkernel opID 6a039df0
2016-01-19T17:28:11.900Z cpu0:1537840 opID=6a039df0)Config: 346: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)
2016-01-19T17:28:11.918Z cpu27:1542061)MemSched: vm 1542061: 8311: extended swap to 8192 pgs
2016-01-19T17:28:12.139Z cpu27:1542061)World: vm 1542062: 1466: Starting world vmm0:linpsvcso04 of type 8
2016-01-19T17:28:12.139Z cpu27:1542061)Sched: vm 1542062: 6425: Adding world 'vmm0:linpsvcso04', group 'host/user', cpu: shares=-3 min=0 minLimit=-1 max=-1, mem: shares=-3 min=0 minLimit=-1 max=-1
2016-01-19T17:28:12.139Z cpu27:1542061)Sched: vm 1542062: 6440: renamed group 2944323 to vm.1542061
2016-01-19T17:28:12.139Z cpu27:1542061)Sched: vm 1542062: 6457: group 2944323 is located under group 4
2016-01-19T17:28:12.140Z cpu27:1542061)MemSched: vm 1542061: 8311: extended swap to 28148 pgs
2016-01-19T17:28:12.145Z cpu27:1542061)World: vm 1542064: 1466: Starting world vmm1:linpsvcso04 of type 8
2016-01-19T17:28:12.233Z cpu20:1542061)Migrate: vm 1542062: 3286: Setting VMOTION info: Dest ts = 1453224491277421, src ip = <10.55.66.47> dest ip = <10.55.66.40> Dest wid = 0 using SHARED swap
2016-01-19T17:28:12.235Z cpu5:1542066)CpuSched: 583: user latency of 1542066 vmotionStreamHelper0-1542062 0 changed by 1542066 vmotionStreamHelper0-1542062 -1
2016-01-19T17:28:12.236Z cpu28:1542067)CpuSched: 583: user latency of 1542067 vmotionStreamHelper1-1542062 0 changed by 1542067 vmotionStreamHelper1-1542062 -1
2016-01-19T17:28:12.236Z cpu18:1542069)CpuSched: 583: user latency of 1542069 vmotionRecvHelper-1542062 0 changed by 1542069 vmotionRecvHelper-1542062 -1
2016-01-19T17:28:12.237Z cpu16:1542068)CpuSched: 583: user latency of 1542068 vmotionSendHelper-1542062 0 changed by 1542068 vmotionSendHelper-1542062 -1
2016-01-19T17:28:12.237Z cpu20:1542061)Tcpip_Vmk: 1288: Affinitizing 10.55.66.40 to world 1542069, Success
2016-01-19T17:28:12.237Z cpu20:1542061)VMotion: 2735: 1453224491277421 😧 Set ip address '10.55.66.40' worldlet affinity to recv World ID 1542069
2016-01-19T17:28:12.237Z cpu20:1542061)Hbr: 3340: Migration start received (worldID=1542062) (migrateType=1) (event=0) (isSource=0) (sharedConfig=1)
2016-01-19T17:28:12.364Z cpu4:33664)MigrateNet: vm 33664: 2096: Accepted connection from <10.55.66.47>
2016-01-19T17:28:12.364Z cpu4:33664)MigrateNet: vm 33664: 2138: data socket size 0 is less than config option 562140
2016-01-19T17:28:12.364Z cpu4:33664)MigrateNet: vm 33664: 2166: dataSocket 0x410b68c14300 receive buffer size is 562140
2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2096: Accepted connection from <10.55.66.47>
2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2138: data socket size 0 is less than config option 562140
2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2166: dataSocket 0x410b68c150e0 receive buffer size is 562140
2016-01-19T17:28:12.365Z cpu4:33664)VMotionUtil: 3396: 1453224491277421 😧 Stream connection 1 added.
2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2096: Accepted connection from <10.55.66.47>
2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2138: data socket size 0 is less than config option 562140
2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2166: dataSocket 0x410b68edd700 receive buffer size is 562140
2016-01-19T17:28:12.365Z cpu4:33664)VMotionUtil: 3396: 1453224491277421 😧 Stream connection 2 added.
2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: Swap: 787: The migration swap type is not supported for migration.
2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: VmMemMigrate: 1976: Received invalid swap slot data (0xc002d177) for pgNum 0x0.
2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: VMotionRecv: 1116: 1453224491277421 😧 failed to add memory page 0x0 to VM: Bad parameter
2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: VMotionUtil: 1476: 1453224491277421 😧 failed draining stream completion: Bad parameter
2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: Migrate: 269: 1453224491277421 😧 Failed: Bad parameter (0xbad0007) @0x418033856f9a
2016-01-19T17:28:12.370Z cpu0:1542066)WARNING: Swap: 787: The migration swap type is not supported for migration.
2016-01-19T17:28:12.370Z cpu0:1542066)WARNING: VmMemMigrate: 1976: Received invalid swap slot data (0xc0017c20) for pgNum 0x481.
2016-01-19T17:28:12.370Z cpu0:1542066)WARNING: VMotionRecv: 1116: 1453224491277421 😧 failed to add memory page 0x481 to VM: Bad parameter
2016-01-19T17:28:12.370Z cpu0:1542066)WARNING: VMotionUtil: 1476: 1453224491277421 😧 failed draining stream completion: Bad parameter
2016-01-19T17:28:12.370Z cpu0:1542066)VMotionUtil: 5333: 1453224491277421 😧 Socket 0x410b68edd700 streamSock pending: 33304/33304 snd 561912 rcv
2016-01-19T17:28:12.370Z cpu0:1542066)VMotionUtil: 5333: 1453224491277421 😧 Socket 0x410b68c150e0 streamSock pending: 33304/33304 snd 162585 rcv
2016-01-19T17:28:12.378Z cpu20:1542061)WARNING: Migrate: 5291: 1453224491277421 😧 Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.
2016-01-19T17:28:12.378Z cpu20:1542061)Hbr: 3434: Migration end received (worldID=1542062) (migrateType=1) (event=1) (isSource=0) (sharedConfig=1)
2016-01-19T17:28:12.413Z cpu17:1535207)Config: 346: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)
VMX_LOG:
2016-01-19T17:28:12.166Z| vmx| I120: VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##1_22476/op/=to
2016-01-19T17:28:12.166Z| vmx| I120: Could not identify IP address family of in/srcLogIp:
2016-01-19T17:28:12.166Z| vmx| I120: Could not identify IP address family of in/dstLogIp:
2016-01-19T17:28:12.166Z| vmx| I120: VMXVmdbVmVmxMigrateGetParam: type: 1 srcIp=<10.55.66.47> dstIp=<10.55.66.40> mid=529b33194dc6d uuid=30303536-3734-5355-4533-34364c364c54 priority=high checksumMemory=no maxDowntime=0 encrypted=0 resumeDuringPageIn=no latencyAware=yes diskOpFile=
2016-01-19T17:28:12.166Z| vmx| I120: VMXVmdbVmVmxMigrateGetParam: type 1 unsharedSwap 0 memMinToTransfer 0 cpuMinToTransfer 0 numDisks 0 numStreamIps 1
2016-01-19T17:28:12.166Z| vmx| I120: Received migrate 'to' request for mid id 1453224491277421, src ip <10.55.66.47>, dst ip <10.55.66.40>(invalidate source config).
2016-01-19T17:28:12.167Z| vmx| I120: SVGA: Maximum display topology 2560x1600.
2016-01-19T17:28:12.184Z| vmx| I120: MigrateSetInfo: state=1 srcIp=<10.55.66.47> dstIp=<10.55.66.40> mid=1453224491277421 uuid=30303536-3734-5355-4533-34364c364c54 priority=high
2016-01-19T17:28:12.184Z| vmx| I120: MigrateSetState: Transitioning from state 0 to 1.
2016-01-19T17:28:12.184Z| vmx| I120: VMXVmdb_SetMigrationHostLogState: hostlog state transits to emigrating for migrate 'to' mid 1453224491277421
2016-01-19T17:28:12.902Z| vmx| I120: VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##1_22479/op/=start
2016-01-19T17:28:12.902Z| vmx| I120: VMXVmdbVmVmxMigrateGetStartParam: mid=529b33194dc6d dstwid=1542062
2016-01-19T17:28:12.902Z| vmx| I120: Received migrate 'start' request for mig id 1453224491277421, dest world id 1542062.
2016-01-19T17:28:13.001Z| vmx| I120: SVGA: Maximum display topology 2560x1600.
2016-01-19T17:28:13.001Z| vmx| I120: MigratePlatformInitMigration: DiskOp file set to /vmfs/volumes/976d8169-6e977849/linpsvcso04/linpsvcso04-diskOp.tmp
2016-01-19T17:28:13.005Z| vmx| I120: MigrateSetState: Transitioning from state 1 to 2.
2016-01-19T17:28:13.011Z| vcpu-0| I120: VMXVmdb_SetMigrationHostLogState: hostlog state transits to failure for migrate 'to' mid 1453224491277421
2016-01-19T17:28:13.015Z| vcpu-0| I120: MigrateSetStateFinished: type=1 new state=5
2016-01-19T17:28:13.015Z| vcpu-0| I120: MigrateSetState: Transitioning from state 2 to 5.
2016-01-19T17:28:13.015Z| vcpu-0| I120: Migrate_SetFailureMsgList: switching to new log file.
2016-01-19T17:28:13.017Z| vcpu-0| I120: Migrate_SetFailureMsgList: Now in new log file.
2016-01-19T17:28:13.047Z| vcpu-0| I120: [msg.checkpoint.precopyfailure] Migration to host <10.55.66.40> failed with error Already disconnected (0xbad002e).
2016-01-19T17:28:13.048Z| vcpu-0| I120: Migrate: cleaning up migration state.
2016-01-19T17:28:13.048Z| vcpu-0| I120: MigrateSetState: Transitioning from state 5 to 0.
2016-01-19T17:28:13.048Z| vcpu-0| I120: Msg_Post: Error
2016-01-19T17:28:13.048Z| vcpu-0| I120: [vob.vmotion.transmit.vbuf.not.connected] vMotion migration [a010a2f:1453224491277421] socket connected returned: Already disconnected
2016-01-19T17:28:13.048Z| vcpu-0| I120: [vob.vmotion.stream.channel.write.no.wait.flush.failed] vMotion migration [a010a2f:1453224491277421] failed to flush stream buffer: Already disconnected
2016-01-19T17:28:13.049Z| vcpu-0| I120: [vob.vmotion.stream.completion.write.fail] vMotion migration [a010a2f:1453224491277421] failed writing stream completion: Already disconnected
2016-01-19T17:28:13.049Z| vcpu-0| I120: [msg.checkpoint.precopyfailure] Migration to host <10.55.66.40> failed with error Already disconnected (0xbad002e).
2016-01-19T17:28:13.049Z| vcpu-0| I120: ----------------------------------------
I think I found a workaround. I have downgraded one of my node to 5.5.0 build 3029944
So from the 5.0 host I migrated the affected machine to this host, and from here I could move to the to 5.5u3b host.
This would mean that the bug have been introduced somewhere between 3029944 and 3343343
We're having exactly the same symptoms. Cannot vMotion from 5.1 build 3070626 to 5.5 build 3343343.
Will test out migrating to a non-patched 5.5 host and report.
Ok no dice, used a 5.5 host at build 2068190 and the 5.1 host still cannot migrate the same 3 VM's.
I haven't tried rebooting these VM's to see if that will allow them to vMotion but as a previous poster said, this isn't a viable solution for us.
I've got a VMware case open so will report back with their findings.
I opened a case Friday night also.
We didn't upgrade to the 5.5 hosts. They are fresh installs. We are trying to vMotion VMs from a 5.1 2583090 host -> 5.5 3248547. Some will migrate fine, most will not. Very odd:
2016-01-25T15:52:45.591Z cpu7:15398651)VMotionUtil: 3380: 1453737159973591 S: Stream connection 2 added.
2016-01-25T15:52:45.592Z cpu6:15292154)WARNING: VMotionUtil: 681: 1453737159973591 S: failed to read stream keepalive: Connection closed by remote host, possibly due to timeout
2016-01-25T15:52:45.592Z cpu6:15292154)WARNING: Migrate: 269: 1453737159973591 S: Failed: Connection closed by remote host, possibly due to timeout (0xbad003f) @0x0
2016-01-25T15:52:45.592Z cpu10:15283961)WARNING: VMotionUtil: 4723: 1453737159973591 S: socket connected returned: Already disconnected
2016-01-25T15:52:45.592Z cpu10:15283961)WARNING: VMotionUtil: 7187: 1453737159973591 S: failed to flush stream buffer: Already disconnected
2016-01-25T15:52:45.592Z cpu10:15283961)WARNING: VMotionUtil: 1618: 1453737159973591 S: failed writing stream completion: Already disconnected
Ours are fresh installs as well Jbsengineer. I'm still waiting for a call back! I've escalated it to a manager.
I have heard back. According to my support person, the issue we are having is related to the default disabling of SSLv3 on 5.5 U3b. Engineering is aware of the issue. We downgraded to 5.5 U3a build 3316895 and all is well.
Josh
Still doesn't work for us, we tried downgrading a host but we still have the issue. Only workaround is to power the VM off and migrate it.
VMware state the issue has been raised to their engineering team and all we can do is wait So frustrating!
Here is the KB. We downgraded to 5.1 and will wait the fix out. Bad timing, we had 2500 VMs to move...
For me there was a workaround, I have used it at least on 5 different cluster, and worked every time.
For example, I had a 4 node cluster, 3 host was upgraded to 5.5u3b, one was still on 5.1
I reinstalled one host on 5.5 U2, then I upgraded it with patches till Sept. 2015. (Sept. patches should be included!) - this was my intermediate host. (build id: 3029944)
Edit: as I see 3029944 - was the simple 5.5u3 build.
Once this was completed, I could migrate all VM server from 5.1 to the intermediate host. Upgrade now the 5.1 to 5.5u3b
Once that was done, I could move everything from intermediate host to 5.5u3b, and upgrade the intermediate as well.
VMware just came out with VMware KB: VMware ESXi 5.5, Patch ESXi550-201602401-SG: Updates esx-base. I will be trying to setup a change window tonight to test and let you know later.