maniee
Enthusiast
Enthusiast

esxi5.1 to esxi5.5

HI

   i have upgraded only one host from esxi5.1 to esxi5.5 u 3  in the cluster.but vmotion struct at 14%. from esxi5.1 to esxi5.5.

what could be the reason for this issue.

20 Replies
a_p_
Leadership
Leadership

Please provide some more details.

Does this happen when you vMotion from v5.5 to v5.5, or v5.1 to v5.5, ...?

Are there any additional details about the failed vMotion process in the tasks and events view?

André

0 Kudos
sicnarf_latosa
Enthusiast
Enthusiast

same issue I faced. until now no resolution

0 Kudos

Hello,

Can you post a snippet of the vmkernel.log during vMotion failure?

The vMotion VMkernel.log should start something like below:

2015-12-25T16:39:25.565Z cpu4:2758489)Migrate: vm 2758492: 3284: Setting VMOTION info: Source ts = 1451061663105920, src ip = <192.168.1.176> dest ip = <192.168.1.177> Dest wid = 1830931 using SHARED swap

When you tail -f vmkernel.log and perform vMotion and when it errors or fails at 14 percent, what is the logging that we see?

Also is vMotion failing to one specific host? Or from one specific host?

Suhas

If you found this or any other answer useful please consider the use of the Helpful or Correct buttons to award points. Don't Backup. Go Forward! Rubrik Peek into my Website: http://www.virtuallypeculiar.com
0 Kudos
a_p_
Leadership
Leadership

I found something that might be worth trying. Change all the vMotion IP addresses (you can set them to their original value later on), and make sure vMotion is checked for the vMotion VMKernel port group(s). See my reply at  Re: Error when trying to vMotion:   failed with error msg.vmk.status.VMK_IS_DISCONNECTED (195887150)

André

0 Kudos
sicnarf_latosa
Enthusiast
Enthusiast

Hi Maniee,

Please check network configuration. are u using vss or vds?

- Check MTU it should be the same configuration in vmotion port group.

- Check your vlan if settle correctly and check as well vlan traffic.

- Check ip address if configure correctly

- How many host encounter this issue?

- Destination host to Source host issue will happen right? can you try to migrate one vm from Source host to Destination host, if you are able to successfully migrate that VM try to reboot problematic Host faced that issue

- Lastly collect logs and let us see if we can able to help you analyzing it Smiley Happy   

0 Kudos
losisoft
Contributor
Contributor

Hi,


We have the same issue, we are upgrading our system from 5.x to 5.5u3b
So far I could reproduce the issue on 3 different cluster.

Details:
Cluster #1 and cluster #2 - contains 4 node.
3 node have been upgraded to 5.5u3b - build 3343343
4th node is still on 5.1.0 build 1483097
The cluster is using VSS on management and VDS for the VM servers.
vswitch0 - management with 2 network card, explicit failover, one active one standby adapter
vds - for the VM hosts - with 2 network card

On cluster #1, and #2 - we have 1-1 VM server which I can not migrate to the upgraded hosts.
I can migrate VM servers to the old node, and then migrate it back. The problem is specific to the VM server.
The VM server which is affected is a Win 2008R2, VM tool version is on the VM server is: 9.0.5 build 1065307
vm hardware is v9

-----------------------------------------
Cluster #3 have 5 nodes
3 node have been upgraded to 5.5u3b - build 3343343
node 4, 5 is still on 5.0.0 build 623860
The cluster is using VSS on both for management and for the VM servers.
There is two vswitch - one for managemnet one for the VM servers - each with 2 network card, explicit failover, one active one standby adapter


Cluster #3
Here we stil have 2 node on 5.0 The affected VM can be migrated freely between the two 5.0.0 host, but it can not be migrated to any of the upgraded 5.5 hosts.

The VM server which is affected is a Win 2008R2, VM tool version is on the VM server is: 8.6.5 build 621624
vm hardware is v8

-----------------------

MTU size on al system is 1500
On the switches jumbo frame is enabled. That means the MTU is 9216. So this should not be the issue.
On network side the servers are OK, we are using them for years, there was no such issue.


Error during the migration is:
Migration to host xx.XX.XX.XX failed with error Already disconnected (195887150)
vmotion migration [-1062716352:1453282157653493] failed writing stram completion: Already disconnected
vmotion migration [-1062716352:1453282157653493] failed to flush stream buffer: Already disconnected
vmotion migration [-1062716352:1453282157653493] socket connected returned: Already disconnected

This comes up at 14%


Sniplet from two of the affected VM server vmware.log file - This is the log what the system created during the vmotion:

2016-01-19T13:39:44.832Z| vmx| I120: MigratePlatformInitMigration:  DiskOp file set to /vmfs/volumes/51b9ba04-be38ab49-ddfc-d89d67136f18/FRJLOLA01/FRJLOLA01-diskOp.tmp

2016-01-19T13:39:44.852Z| vmx| I120: MigrateWaitForData: waiting for data.

2016-01-19T13:39:44.852Z| vmx| I120: MigrateSetState: Transitioning from state 7 to 8.

2016-01-19T13:39:44.891Z| vmx| I120: MigrateRPC_RetrieveMessages: Informed of a new user message, but can't handle messages in state 4.  Leaving the message queued.

2016-01-19T13:39:44.998Z| vmx| I120: MigrateSetStateFinished: type=2 new state=11

2016-01-19T13:39:44.998Z| vmx| I120: MigrateSetState: Transitioning from state 8 to 11.

2016-01-19T13:39:44.998Z| vmx| I120: [msg.migrate.waitdata.platform] Failed waiting for data.  Error bad0007. Bad parameter.

2016-01-19T13:39:44.998Z| vmx| I120: Migrate: cleaning up migration state.

2016-01-19T13:39:44.999Z| vmx| I120: MigrateSetState: Transitioning from state 11 to 0.

2016-01-19T13:39:44.999Z| vmx| I120: Migrate: Final status reported to VMDB.

2016-01-19T13:39:44.999Z| vmx| I120: Module Migrate power on failed.

2016-01-19T13:39:44.999Z| vmx| I120: VMX_PowerOn: ModuleTable_PowerOn = 0

2016-01-19T13:39:44.999Z| vmx| I120: SVMotion_PowerOff: Not running Storage vMotion. Nothing to do

2016-01-19T13:39:45.000Z| vmx| I120: WORKER: asyncOps=1 maxActiveOps=1 maxPending=0 maxCompleted=0

2016-01-19T13:39:45.000Z| vmx| I120: Vix: [275507 mainDispatch.c:1201]: VMAutomationPowerOff: Powering off.

2016-01-19T13:39:45.001Z| vmx| W110: /vmfs/volumes/51b9ba04-be38ab49-ddfc-d89d67136f18/FRJLOLA01/FRJLOLA01.vmx: Cannot remove symlink /var/run/vmware/root_0/1453210780080218_275507/configFile: No such file or directory

2016-01-19T13:39:45.055Z| vmx| I120: Vix: [275507 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0

2016-01-19T13:39:45.056Z| vmx| I120: Msg_Post: Error

2016-01-19T13:39:45.056Z| vmx| I120: [vob.swap.migrate.invalidindex.mig] The migration swap type is not supported for migration.

2016-01-19T13:39:45.056Z| vmx| I120: [vob.migrate.addpage.swapped.invalidindex] Received invalid swap slot data (0xc000ec8c) for pgNum 0.

2016-01-19T13:39:45.056Z| vmx| I120: [vob.vmotion.addpage.failed.status] vMotion migration [c0a83c40:1453210779366300] failed to add memory page 0 to VM: Bad parameter

2016-01-19T13:39:45.056Z| vmx| I120: [vob.vmotion.stream.completion.complete.fail] vMotion migration [c0a83c40:1453210779366300] failed draining stream completion: Bad parameter

2016-01-19T13:39:45.056Z| vmx| I120: [msg.moduletable.powerOnFailed] Module Migrate power on failed.

2016-01-19T13:39:45.056Z| vmx| I120: [msg.vmx.poweron.failed] Failed to start the virtual machine.
2016-01-19T13:39:45.056Z| vmx| I120: ----------------------------------------
2016-01-19T13:39:45.058Z| vmx| I120: VmdbPipeStreamsOvlError Couldn't read: OVL_STATUS_EOF, (2) No such file or directory.

2016-01-19T13:39:45.058Z| vmx| I120: VmdbCnxDisconnect: Disconnect: closed pipe for pub cnx '/db/connection/#1/' (-32)

2016-01-19T13:39:45.059Z| vmx| I120: VmdbDbRemoveCnx: Removing Cnx from Db for '/db/connection/#1/'

2016-01-19T13:39:45.059Z| vmx| I120: VUIDialogDequeue: found 0 vmdb ui connections; canceling dialogs

2016-01-19T13:39:45.059Z| vmx| I120: Vix: [275507 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0

2016-01-19T13:39:45.059Z| vmx| I120: Transitioned vmx/execState/val to poweredOff

2016-01-19T13:39:45.059Z| vmx| I120: Vix: [275507 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0

2016-01-19T13:39:45.059Z| vmx| I120: Vix: [275507 mainDispatch.c:4003]: Error VIX_E_FAIL in VMAutomation_ReportPowerOpFinished(): Unknown error

2016-01-19T13:39:45.066Z| vmx| I120: Vix: [275507 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0

2016-01-19T13:39:45.066Z| vmx| I120: Transitioned vmx/execState/val to poweredOff

2016-01-19T13:39:45.067Z| vmx| I120: VMIOP: Exit

2016-01-19T13:39:45.069Z| vmx| I120: Vix: [275507 mainDispatch.c:849]: VMAutomation_LateShutdown()
2016-01-19T13:39:45.069Z| vmx| I120: Vix: [275507 mainDispatch.c:799]: VMAutomationCloseListenerSocket. Closing listener socket.


2016-01-20T08:27:44.719Z| vmx| I120: WORKER: Creating new group with numThreads=1 (4)

2016-01-20T08:27:44.720Z| vmx| I120: FTCpt: (0 unk) State transition 0 -> 1

2016-01-20T08:27:44.736Z| vmx| I120: WORKER: Creating new group with numThreads=1 (4)

2016-01-20T08:27:44.736Z| vmx| I120: MigratePlatformInitMigration:  DiskOp file set to /vmfs/volumes/54dcb18a-c0a1ec7e-b76e-d89d671472c4/FRHUOPAP01/FRHUOPAP01-diskOp.tmp

2016-01-20T08:27:44.758Z| vmx| I120: MigrateWaitForData: waiting for data.

2016-01-20T08:27:44.758Z| vmx| I120: MigrateSetState: Transitioning from state 7 to 8.

2016-01-20T08:27:46.112Z| vmx| I120: MigrateSetStateFinished: type=2 new state=11

2016-01-20T08:27:46.112Z| vmx| I120: MigrateSetState: Transitioning from state 8 to 11.

2016-01-20T08:27:46.112Z| vmx| I120: [msg.migrate.waitdata.platform] Failed waiting for data.  Error bad0007. Bad parameter.

2016-01-20T08:27:46.113Z| vmx| I120: Migrate: cleaning up migration state.

2016-01-20T08:27:46.113Z| vmx| I120: MigrateSetState: Transitioning from state 11 to 0.
2016-01-20T08:27:46.114Z| vmx| I120: Migrate: Final status reported to VMDB.

2016-01-20T08:27:46.114Z| vmx| I120: Module Migrate power on failed.

2016-01-20T08:27:46.114Z| vmx| I120: VMX_PowerOn: ModuleTable_PowerOn = 0

2016-01-20T08:27:46.114Z| vmx| I120: SVMotion_PowerOff: Not running Storage vMotion. Nothing to do
2016-01-20T08:27:46.114Z| vmx| I120: WORKER: asyncOps=0 maxActiveOps=0 maxPending=0 maxCompleted=0

2016-01-20T08:27:46.114Z| vmx| I120: Vix: [240094 mainDispatch.c:1201]: VMAutomationPowerOff: Powering off.

2016-01-20T08:27:46.116Z| vmx| W110: /vmfs/volumes/54dcb18a-c0a1ec7e-b76e-d89d671472c4/FRHUOPAP01/FRHUOPAP01.vmx: Cannot remove symlink /var/run/vmware/root_0/1453278463967538_240094/configFile: No such file or directory

2016-01-20T08:27:46.147Z| vmx| I120: Vix: [240094 mainDispatch.c:3964]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0

2016-01-20T08:27:46.148Z| vmx| I120: Msg_Post: Error

2016-01-20T08:27:46.148Z| vmx| I120: [vob.swap.migrate.invalidindex.mig] The migration swap type is not supported for migration.

2016-01-20T08:27:46.148Z| vmx| I120: [vob.migrate.addpage.swapped.invalidindex] Received invalid swap slot data (0xc0029591) for pgNum 0x1ca1.

2016-01-20T08:27:46.148Z| vmx| I120: [vob.vmotion.addpage.failed.status] vMotion migration [c0a8dce0:1453278463248266] failed to add memory page 0x1ca1 to VM: Bad parameter

2016-01-20T08:27:46.148Z| vmx| I120: [vob.vmotion.stream.completion.complete.fail] vMotion migration [c0a8dce0:1453278463248266] failed draining stream completion: Bad parameter

2016-01-20T08:27:46.148Z| vmx| I120: [msg.moduletable.powerOnFailed] Module Migrate power on failed.

2016-01-20T08:27:46.148Z| vmx| I120: [msg.vmx.poweron.failed] Failed to start the virtual machine.

2016-01-20T08:27:46.148Z| vmx| I120: ----------------------------------------


I tried to change on the 5.5 host the
VMkernel.boot.netPktHeapMaxMBperGB from 6 to 12
and
VMkernel.boot.netPktPoolMaxMBperGB from 75 to 100

on the 5.1 host change the
VMkernel.Boot.netPktPoolMaxSize from 656 to 800.
VMkernel.Boot.netPktHeapMaxSize from 64 to 128

Tried to do an IP address change on the vmkernel interface.

No change. Issue remains.

0 Kudos
FJ1200
Enthusiast
Enthusiast

You need Jumbo frames end-to-end on the whole stack, not just on the switches.  It needs to be enabled on the vmnics as well.  As far as I know it really only comes into play with iSCSI connections unless someone knows otherwise and can correct me.

0 Kudos
attvoperations
Contributor
Contributor

Similar issue here.  Upgraded one host in a cluster from 5.1U1 to 5.5U3.  Cannot VMotion a subset of VMs from 5.1 hosts to the recently upgraded 5.5U3b host with the "Already Disconnected" errors at the 14% mark.  The network settings are unchanged from before the upgrade and MTU is set at default 1500.

On a trial set of 10 VMs, 7 were able to VMotion over but 3 would not (the same 3 will continuously fail on multiple attempts)..  If I power off the VM and start it back up, it is then able to VMotion around.  However, this is not desirable solution for all the affected VMs across clusters.

Log snippets below from the source and destination hosts as well as VMX log.

SOURCE HOST vmkernel:


2016-01-19T17:28:12.164Z cpu0:9209)Config: 347: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)

2016-01-19T17:28:13.001Z cpu16:707103)Migrate: vm 707104: 3273: Setting VMOTION info: Source ts = 1453224491277421, src ip = <10.50.60.47> dest ip = <10.50.60.40> Dest wid = 1542062 using SHARED swap


2016-01-19T17:28:13.005Z cpu16:707103)Tcpip_Vmk: 1213: Affinitizing 10.55.66.47 to world 725253, Success

2016-01-19T17:28:13.005Z cpu16:707103)VMotion: 2688: 1453224491277421 S: Set ip address '10.55.66.47' worldlet affinity to send World ID 725253

2016-01-19T17:28:13.005Z cpu16:707103)Hbr: 3308: Migration start received (worldID=707104) (migrateType=1) (event=0) (isSource=1) (sharedConfig=1)

2016-01-19T17:28:13.005Z cpu22:8816)MigrateNet: vm 8816: 2077: Accepted connection from <10.55.66.40>

2016-01-19T17:28:13.005Z cpu22:8816)MigrateNet: vm 8816: 2147: dataSocket 0x4100172f77d0 receive buffer size is 563272

2016-01-19T17:28:13.005Z cpu29:725253)VMotionUtil: 3184: 1453224491277421 S: Stream connection 1 added.

2016-01-19T17:28:13.005Z cpu29:725253)VMotionUtil: 3184: 1453224491277421 S: Stream connection 2 added.

2016-01-19T17:28:13.010Z cpu6:725252)WARNING: VMotionUtil: 4526: 1453224491277421 S: socket connected returned: Already disconnected

2016-01-19T17:28:13.010Z cpu6:725252)WARNING: VMotionUtil: 6990: 1453224491277421 S: failed to flush stream buffer: Already disconnected

2016-01-19T17:28:13.010Z cpu6:725252)WARNING: VMotionUtil: 1617: 1453224491277421 S: failed writing stream completion: Already disconnected

2016-01-19T17:28:13.010Z cpu6:725252)WARNING: Migrate: 269: 1453224491277421 S: Failed: Already disconnected (0xbad002e) @0x0

2016-01-19T17:28:13.011Z cpu14:725251)WARNING: VMotionUtil: 4526: 1453224491277421 S: socket connected returned: Already disconnected

2016-01-19T17:28:13.011Z cpu14:725251)WARNING: VMotionUtil: 6990: 1453224491277421 S: failed to flush stream buffer: Already disconnected

2016-01-19T17:28:13.011Z cpu14:725251)WARNING: VMotionUtil: 1617: 1453224491277421 S: failed writing stream completion: Already disconnected

2016-01-19T17:28:13.048Z cpu20:707277)WARNING: Migrate: 5244: 1453224491277421 S: Migration considered a failure by the VMX.  It is most likely a timeout, but check the VMX log for the true error.

2016-01-19T17:28:13.048Z cpu20:707277)Hbr: 3405: Migration end received (worldID=707104) (migrateType=1) (event=1) (isSource=1) (sharedConfig=1)

2016-01-19T17:28:13.119Z cpu9:10858)Config: 347: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)

DESTINATION HOST vmkernel:

2016-01-19T17:28:11.900Z cpu0:1537840 opID=6a039df0)World: 14369: VC opID C1B95B14-00000717-8c-e8-88 maps to vmkernel opID 6a039df0

2016-01-19T17:28:11.900Z cpu0:1537840 opID=6a039df0)Config: 346: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)

2016-01-19T17:28:11.918Z cpu27:1542061)MemSched: vm 1542061: 8311: extended swap to 8192 pgs

2016-01-19T17:28:12.139Z cpu27:1542061)World: vm 1542062: 1466: Starting world vmm0:linpsvcso04 of type 8

2016-01-19T17:28:12.139Z cpu27:1542061)Sched: vm 1542062: 6425: Adding world 'vmm0:linpsvcso04', group 'host/user', cpu: shares=-3 min=0 minLimit=-1 max=-1, mem: shares=-3 min=0 minLimit=-1 max=-1

2016-01-19T17:28:12.139Z cpu27:1542061)Sched: vm 1542062: 6440: renamed group 2944323 to vm.1542061

2016-01-19T17:28:12.139Z cpu27:1542061)Sched: vm 1542062: 6457: group 2944323 is located under group 4

2016-01-19T17:28:12.140Z cpu27:1542061)MemSched: vm 1542061: 8311: extended swap to 28148 pgs

2016-01-19T17:28:12.145Z cpu27:1542061)World: vm 1542064: 1466: Starting world vmm1:linpsvcso04 of type 8

2016-01-19T17:28:12.233Z cpu20:1542061)Migrate: vm 1542062: 3286: Setting VMOTION info: Dest ts = 1453224491277421, src ip = <10.55.66.47> dest ip = <10.55.66.40> Dest wid = 0 using SHARED swap

2016-01-19T17:28:12.235Z cpu5:1542066)CpuSched: 583: user latency of 1542066 vmotionStreamHelper0-1542062 0 changed by 1542066 vmotionStreamHelper0-1542062 -1

2016-01-19T17:28:12.236Z cpu28:1542067)CpuSched: 583: user latency of 1542067 vmotionStreamHelper1-1542062 0 changed by 1542067 vmotionStreamHelper1-1542062 -1

2016-01-19T17:28:12.236Z cpu18:1542069)CpuSched: 583: user latency of 1542069 vmotionRecvHelper-1542062 0 changed by 1542069 vmotionRecvHelper-1542062 -1

2016-01-19T17:28:12.237Z cpu16:1542068)CpuSched: 583: user latency of 1542068 vmotionSendHelper-1542062 0 changed by 1542068 vmotionSendHelper-1542062 -1

2016-01-19T17:28:12.237Z cpu20:1542061)Tcpip_Vmk: 1288: Affinitizing 10.55.66.40 to world 1542069, Success

2016-01-19T17:28:12.237Z cpu20:1542061)VMotion: 2735: 1453224491277421 😧 Set ip address '10.55.66.40' worldlet affinity to recv World ID 1542069

2016-01-19T17:28:12.237Z cpu20:1542061)Hbr: 3340: Migration start received (worldID=1542062) (migrateType=1) (event=0) (isSource=0) (sharedConfig=1)

2016-01-19T17:28:12.364Z cpu4:33664)MigrateNet: vm 33664: 2096: Accepted connection from <10.55.66.47>

2016-01-19T17:28:12.364Z cpu4:33664)MigrateNet: vm 33664: 2138: data socket size 0 is less than config option 562140

2016-01-19T17:28:12.364Z cpu4:33664)MigrateNet: vm 33664: 2166: dataSocket 0x410b68c14300 receive buffer size is 562140

2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2096: Accepted connection from <10.55.66.47>

2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2138: data socket size 0 is less than config option 562140

2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2166: dataSocket 0x410b68c150e0 receive buffer size is 562140

2016-01-19T17:28:12.365Z cpu4:33664)VMotionUtil: 3396: 1453224491277421 😧 Stream connection 1 added.

2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2096: Accepted connection from <10.55.66.47>

2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2138: data socket size 0 is less than config option 562140

2016-01-19T17:28:12.365Z cpu4:33664)MigrateNet: vm 33664: 2166: dataSocket 0x410b68edd700 receive buffer size is 562140

2016-01-19T17:28:12.365Z cpu4:33664)VMotionUtil: 3396: 1453224491277421 😧 Stream connection 2 added.

2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: Swap: 787: The migration swap type is not supported for migration.

2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: VmMemMigrate: 1976: Received invalid swap slot data (0xc002d177) for pgNum 0x0.

2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: VMotionRecv: 1116: 1453224491277421 😧 failed to add memory page 0x0 to VM: Bad parameter

2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: VMotionUtil: 1476: 1453224491277421 😧 failed draining stream completion: Bad parameter

2016-01-19T17:28:12.367Z cpu28:1542067)WARNING: Migrate: 269: 1453224491277421 😧 Failed: Bad parameter (0xbad0007) @0x418033856f9a

2016-01-19T17:28:12.370Z cpu0:1542066)WARNING: Swap: 787: The migration swap type is not supported for migration.

2016-01-19T17:28:12.370Z cpu0:1542066)WARNING: VmMemMigrate: 1976: Received invalid swap slot data (0xc0017c20) for pgNum 0x481.

2016-01-19T17:28:12.370Z cpu0:1542066)WARNING: VMotionRecv: 1116: 1453224491277421 😧 failed to add memory page 0x481 to VM: Bad parameter

2016-01-19T17:28:12.370Z cpu0:1542066)WARNING: VMotionUtil: 1476: 1453224491277421 😧 failed draining stream completion: Bad parameter

2016-01-19T17:28:12.370Z cpu0:1542066)VMotionUtil: 5333: 1453224491277421 😧 Socket 0x410b68edd700 streamSock pending: 33304/33304 snd 561912 rcv

2016-01-19T17:28:12.370Z cpu0:1542066)VMotionUtil: 5333: 1453224491277421 😧 Socket 0x410b68c150e0 streamSock pending: 33304/33304 snd 162585 rcv

2016-01-19T17:28:12.378Z cpu20:1542061)WARNING: Migrate: 5291: 1453224491277421 😧 Migration considered a failure by the VMX.  It is most likely a timeout, but check the VMX log for the true error.

2016-01-19T17:28:12.378Z cpu20:1542061)Hbr: 3434: Migration end received (worldID=1542062) (migrateType=1) (event=1) (isSource=0) (sharedConfig=1)

2016-01-19T17:28:12.413Z cpu17:1535207)Config: 346: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)

VMX_LOG:

2016-01-19T17:28:12.166Z| vmx| I120: VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##1_22476/op/=to

2016-01-19T17:28:12.166Z| vmx| I120: Could not identify IP address family of in/srcLogIp:

2016-01-19T17:28:12.166Z| vmx| I120: Could not identify IP address family of in/dstLogIp:

2016-01-19T17:28:12.166Z| vmx| I120: VMXVmdbVmVmxMigrateGetParam: type: 1 srcIp=<10.55.66.47> dstIp=<10.55.66.40> mid=529b33194dc6d uuid=30303536-3734-5355-4533-34364c364c54 priority=high checksumMemory=no maxDowntime=0 encrypted=0 resumeDuringPageIn=no latencyAware=yes diskOpFile=

2016-01-19T17:28:12.166Z| vmx| I120: VMXVmdbVmVmxMigrateGetParam: type 1 unsharedSwap 0 memMinToTransfer 0 cpuMinToTransfer 0 numDisks 0 numStreamIps 1

2016-01-19T17:28:12.166Z| vmx| I120: Received migrate 'to' request for mid id 1453224491277421, src ip <10.55.66.47>, dst ip <10.55.66.40>(invalidate source config).

2016-01-19T17:28:12.167Z| vmx| I120: SVGA: Maximum display topology 2560x1600.

2016-01-19T17:28:12.184Z| vmx| I120: MigrateSetInfo: state=1 srcIp=<10.55.66.47> dstIp=<10.55.66.40> mid=1453224491277421 uuid=30303536-3734-5355-4533-34364c364c54 priority=high

2016-01-19T17:28:12.184Z| vmx| I120: MigrateSetState: Transitioning from state 0 to 1.

2016-01-19T17:28:12.184Z| vmx| I120: VMXVmdb_SetMigrationHostLogState: hostlog state transits to emigrating for migrate 'to' mid 1453224491277421

2016-01-19T17:28:12.902Z| vmx| I120: VMXVmdbCbVmVmxMigrate: Got SET callback for /vm/#_VMX/vmx/migrateState/cmd/##1_22479/op/=start

2016-01-19T17:28:12.902Z| vmx| I120: VMXVmdbVmVmxMigrateGetStartParam: mid=529b33194dc6d dstwid=1542062

2016-01-19T17:28:12.902Z| vmx| I120: Received migrate 'start' request for mig id 1453224491277421, dest world id 1542062.

2016-01-19T17:28:13.001Z| vmx| I120: SVGA: Maximum display topology 2560x1600.

2016-01-19T17:28:13.001Z| vmx| I120: MigratePlatformInitMigration:  DiskOp file set to /vmfs/volumes/976d8169-6e977849/linpsvcso04/linpsvcso04-diskOp.tmp

2016-01-19T17:28:13.005Z| vmx| I120: MigrateSetState: Transitioning from state 1 to 2.

2016-01-19T17:28:13.011Z| vcpu-0| I120: VMXVmdb_SetMigrationHostLogState: hostlog state transits to failure for migrate 'to' mid 1453224491277421

2016-01-19T17:28:13.015Z| vcpu-0| I120: MigrateSetStateFinished: type=1 new state=5

2016-01-19T17:28:13.015Z| vcpu-0| I120: MigrateSetState: Transitioning from state 2 to 5.

2016-01-19T17:28:13.015Z| vcpu-0| I120: Migrate_SetFailureMsgList: switching to new log file.

2016-01-19T17:28:13.017Z| vcpu-0| I120: Migrate_SetFailureMsgList: Now in new log file.

2016-01-19T17:28:13.047Z| vcpu-0| I120: [msg.checkpoint.precopyfailure] Migration to host <10.55.66.40> failed with error Already disconnected (0xbad002e).

2016-01-19T17:28:13.048Z| vcpu-0| I120: Migrate: cleaning up migration state.

2016-01-19T17:28:13.048Z| vcpu-0| I120: MigrateSetState: Transitioning from state 5 to 0.

2016-01-19T17:28:13.048Z| vcpu-0| I120: Msg_Post: Error

2016-01-19T17:28:13.048Z| vcpu-0| I120: [vob.vmotion.transmit.vbuf.not.connected] vMotion migration [a010a2f:1453224491277421] socket connected returned: Already disconnected

2016-01-19T17:28:13.048Z| vcpu-0| I120: [vob.vmotion.stream.channel.write.no.wait.flush.failed] vMotion migration [a010a2f:1453224491277421] failed to flush stream buffer: Already disconnected

2016-01-19T17:28:13.049Z| vcpu-0| I120: [vob.vmotion.stream.completion.write.fail] vMotion migration [a010a2f:1453224491277421] failed writing stream completion: Already disconnected

2016-01-19T17:28:13.049Z| vcpu-0| I120: [msg.checkpoint.precopyfailure] Migration to host <10.55.66.40> failed with error Already disconnected (0xbad002e).

2016-01-19T17:28:13.049Z| vcpu-0| I120: ----------------------------------------


0 Kudos
losisoft
Contributor
Contributor

I think I found a workaround. I have downgraded one of my node to 5.5.0 build 3029944

So from the 5.0 host I migrated the affected machine to this host, and from here I could move to the to 5.5u3b host.

This would mean that the bug have been introduced somewhere between 3029944 and 3343343

0 Kudos
ODB2001
Enthusiast
Enthusiast

We're having exactly the same symptoms. Cannot vMotion from 5.1 build 3070626 to 5.5 build 3343343.

Will test out migrating to a non-patched 5.5 host and report.

0 Kudos
ODB2001
Enthusiast
Enthusiast

Ok no dice, used a 5.5 host at build 2068190 and the 5.1 host still cannot migrate the same 3 VM's.

I haven't tried rebooting these VM's to see if that will allow them to vMotion but as a previous poster said, this isn't a viable solution for us.

I've got a VMware case open so will report back with their findings.

0 Kudos
jbsengineer
Enthusiast
Enthusiast

I opened a case Friday night also.

We didn't upgrade to the 5.5 hosts.  They are fresh installs.  We are trying to vMotion VMs from a 5.1 2583090 host -> 5.5 3248547.  Some will migrate fine, most will not.  Very odd:

2016-01-25T15:52:45.591Z cpu7:15398651)VMotionUtil: 3380: 1453737159973591 S: Stream connection 2 added.

2016-01-25T15:52:45.592Z cpu6:15292154)WARNING: VMotionUtil: 681: 1453737159973591 S: failed to read stream keepalive: Connection closed by remote host, possibly due to timeout

2016-01-25T15:52:45.592Z cpu6:15292154)WARNING: Migrate: 269: 1453737159973591 S: Failed: Connection closed by remote host, possibly due to timeout (0xbad003f) @0x0

2016-01-25T15:52:45.592Z cpu10:15283961)WARNING: VMotionUtil: 4723: 1453737159973591 S: socket connected returned: Already disconnected

2016-01-25T15:52:45.592Z cpu10:15283961)WARNING: VMotionUtil: 7187: 1453737159973591 S: failed to flush stream buffer: Already disconnected

2016-01-25T15:52:45.592Z cpu10:15283961)WARNING: VMotionUtil: 1618: 1453737159973591 S: failed writing stream completion: Already disconnected

0 Kudos
ODB2001
Enthusiast
Enthusiast

Ours are fresh installs as well Jbsengineer. I'm still waiting for a call back! I've escalated it to a manager.

0 Kudos
jbsengineer
Enthusiast
Enthusiast

I have heard back.  According to my support person, the issue we are having is related to the default disabling of SSLv3 on 5.5 U3b.  Engineering is aware of the issue.  We downgraded to 5.5 U3a build 3316895 and all is well.

Josh

0 Kudos
ODB2001
Enthusiast
Enthusiast

Still doesn't work for us, we tried downgrading a host but we still have the issue. Only workaround is to power the VM off and migrate it.

VMware state the issue has been raised to their engineering team and all we can do is wait Smiley Sad So frustrating!

0 Kudos
jbsengineer
Enthusiast
Enthusiast

VMware KB: After upgrading from ESXi 5.0/5.1 to ESXi 5.5 Update 3 or later, vMotion fails with: fail...

Here is the KB.  We downgraded to 5.1 and will wait the fix out.  Bad timing, we had 2500 VMs to move...

0 Kudos
losisoft
Contributor
Contributor

For me there was a workaround, I have used it at least on 5 different cluster, and worked every time.

For example, I had a 4 node cluster, 3 host was upgraded to 5.5u3b, one was still on 5.1

I reinstalled one host on 5.5 U2, then I upgraded it with patches till Sept. 2015. (Sept. patches should be included!) - this was my intermediate host. (build id: 3029944)

Edit: as I see 3029944 - was the simple 5.5u3 build.

Once this was completed, I could migrate all VM server from 5.1 to the intermediate host. Upgrade now the 5.1 to 5.5u3b

Once that was done, I could move everything from intermediate host to 5.5u3b, and upgrade the intermediate as well.

0 Kudos
jphillips40
Contributor
Contributor

VMware just came out with VMware KB: VMware ESXi 5.5, Patch ESXi550-201602401-SG: Updates esx-base. I will be trying to setup a change window tonight to test and let you know later.

0 Kudos