VMware Cloud Community
vijay_0812
Contributor
Contributor

vMotion failure

Hello,

I am facing vMotion failures in one of the hosts running ESXi6.0 update2. Is anyone had this kind of issue?  Please find the below information and help me to understand the root cause for this failure

vMkernel Log is showing below information during the migration failure:

2017-03-12T15:10:00.658Z cpu40:1522911)Migrate: vm 1522912: 3385: Setting VMOTION info: Source ts = 1489331391021669, src ip = <XXXXXXXX> dest ip = <10.115.135.24> Dest wid = 1559963 using SHARED swap

2017-03-12T15:10:00.659Z cpu40:1522911)Hbr: 3394: Migration start received (worldID=1522912) (migrateType=1) (event=0) (isSource=1) (sharedConfig=1)

2017-03-12T15:10:00.660Z cpu0:1553099)VMotionUtil: 3995: 1489331391021669 S: Stream connection 1 added.

2017-03-12T15:10:00.671Z cpu0:1553097)WARNING: VMotionUtil: 733: 1489331391021669 S: failed to read stream keepalive: Connection closed by remote host, possibly due to timeout

2017-03-12T15:10:00.671Z cpu0:1553097)WARNING: Migrate: 270: 1489331391021669 S: Failed: Connection closed by remote host, possibly due to timeout (0xbad003f) @0x41800da14f5e

2017-03-12T15:10:00.840Z cpu24:1522922)WARNING: Migrate: 5454: 1489331391021669 S: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.

2017-03-12T15:10:00.840Z cpu24:1522922)Hbr: 3488: Migration end received (worldID=1522912) (migrateType=1) (event=1) (isSource=1) (sharedConfig=1)

2017-03-12T15:10:00.875Z cpu29:1418851)Config: 681: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)

Vmware.log in that VM is showing as below:

2017-03-12T15:29:53.281Z| vcpu-0| I125: VMXVmdb_SetMigrationHostLogState: hostlog state transits to failure for migrate 'to' mid 1489332587942583

2017-03-12T15:29:53.305Z| vcpu-0| I125: MigrateSetStateFinished: type=1 new state=6

2017-03-12T15:29:53.305Z| vcpu-0| I125: MigrateSetState: Transitioning from state 2 to 6.

2017-03-12T15:29:53.305Z| vcpu-0| A100: ConfigDB: Setting config.readOnly = "FALSE"

2017-03-12T15:29:53.305Z| vcpu-0| I125: Migrate_SetFailureMsgList: switching to new log file.

2017-03-12T15:29:53.306Z| vcpu-0| I125: Migrate_SetFailureMsgList: Now in new log file.

2017-03-12T15:29:53.548Z| vcpu-0| I125: Migrate: Caching migration error message list:

2017-03-12T15:29:53.548Z| vcpu-0| I125: [msg.checkpoint.precopyfailure] Migration to host <10.115.135.62> failed with error Connection closed by remote host, possibly due to timeout (0xbad003f).

2017-03-12T15:29:53.548Z| vcpu-0| I125: [vob.vmotion.stream.keepalive.read.fail] vMotion migration [a738729:1489332587942583] failed to read stream keepalive: Connection closed by remote host, possibly due to timeout

2017-03-12T15:29:53.548Z| vcpu-0| I125: Migrate: cleaning up migration state.

2017-03-12T15:29:53.549Z| vcpu-0| I125: VigorTransport_ServerSendResponse opID=58b52a1e-ed-7c-851d seq=19532: Completed Migrate request.

2017-03-12T15:29:53.549Z| vcpu-0| I125: Migrate: Final status reported through Vigor.

2017-03-12T15:29:53.549Z| vcpu-0| I125: MigrateSetState: Transitioning from state 6 to 0.

2017-03-12T15:29:53.549Z| vcpu-0| I125: Migrate: Final status reported through VMDB.

2017-03-12T15:29:53.549Z| vcpu-0| I125: Msg_Post: Error

2017-03-12T15:29:53.549Z| vcpu-0| I125: [vob.vmotion.stream.keepalive.read.fail] vMotion migration [a738729:1489332587942583] failed to read stream keepalive: Connection closed by remote host, possibly due to timeout

2017-03-12T15:29:53.549Z| vcpu-0| I125: [msg.checkpoint.precopyfailure] Migration to host <10.115.135.62> failed with error Connection closed by remote host, possibly due to timeout (0xbad003f).

2017-03-12T15:29:53.549Z| vcpu-0| I125: ----------------------------------------

2017-03-12T15:29:53.582Z| vcpu-0| I125: Vigor_MessageRevoke: message 'msg.checkpoint.precopyfailure' (seq 18864120) is revoked

When i try vmkping, no packet drops but i saw DRPRX is showing values in esxtop.

is this unusual or usual?



  pastedImage_2.png

When i ran esxcli network nic stats -n vmnic0, i am seeing receive missed errors. Is this usual or unusual?

pastedImage_7.png

Thank you

VJ

Tags (1)
Reply
0 Kudos
2 Replies
OversizedSpoon
Enthusiast
Enthusiast

2017-03-12T15:29:53.548Z| vcpu-0| I125: [msg.checkpoint.precopyfailure] Migration to host <10.115.135.62> failed with error Connection closed by remote host, possibly due to timeout (0xbad003f).

Are you using a dedicated vMotion networking?

Can you please confirm that configuration on the VMKernel ports are correct? Also that vMotion isn't enabled on multiple VMKernel? Check source and destination.

Reply
0 Kudos
vijay_0812
Contributor
Contributor

Thank you for your response.

We are using two common NICs for vMotion, iSCSI and Management network with different port groups.

only one vMotion is enabled in each host.

Reply
0 Kudos