I have a VM with a large number of VMDKs each with individual flash read cache settings. When i place its host into maintenance mode the VM starts to migrate then fails. The only fix is to restart vCenter services and hard bounce the host. I have been able to replicate this issue with a test machine by adding many VMDKs each with vFRC configured and doing a drag and drop migration in the vCenter heavy client. However when i use the web client and manually select dropping the cache it works without issue.
log data
2018-02-01T00:40:48.371Z| Worker#3| I125: Migrate: Remote Log: Destination waited for 132.56 seconds.
2018-02-01T00:40:48.379Z| vcpu-0| I125: VMXVmdb_SetMigrationHostLogState: hostlog state transits to failure for migrate 'to' mid 1517445513843299
2018-02-01T00:40:48.396Z| vcpu-0| I125: MigrateSetStateFinished: type=1 new state=6
2018-02-01T00:40:48.396Z| vcpu-0| I125: MigrateSetState: Transitioning from state 3 to 6.
2018-02-01T00:40:48.396Z| vcpu-0| A100: ConfigDB: Setting config.readOnly = "FALSE"
2018-02-01T00:40:48.396Z| vcpu-0| I125: Migrate_SetFailureMsgList: switching to new log file.
2018-02-01T00:40:48.400Z| vcpu-0| I125: Migrate_SetFailureMsgList: Now in new log file.
2018-02-01T00:40:48.495Z| vcpu-0| I125: Migrate: Caching migration error message list:
2018-02-01T00:40:48.495Z| vcpu-0| I125: [msg.checkpoint.precopyfailure] Migration to host <ip.add.re.ss> failed with error Connection closed by remote host, possibly due to timeout (0xbad003f).
2018-02-01T00:40:48.495Z| vcpu-0| I125: [vob.vmotion.stream.keepalive.read.fail] vMotion migration [a3e007a:1517445513843299] failed to read stream keepalive: Connection closed by remote host, possibly due to timeout
2018-02-01T00:40:48.495Z| vcpu-0| I125: Migrate: cleaning up migration state.
2018-02-01T00:40:48.495Z| vcpu-0| I125: SVMotion: Enter Phase 13
2018-02-01T00:40:48.495Z| vcpu-0| I125: SVMotion_Cleanup: Scheduling cleanup thread.
2018-02-01T00:40:48.495Z| Worker#1| I125: SVMotionCleanupThread: Waiting for SVMotion Bitmap thread to complete.
2018-02-01T00:40:48.495Z| vcpu-0| I125: Msg_Post: Error
2018-02-01T00:40:48.495Z| vcpu-0| I125: [vob.vmotion.stream.keepalive.read.fail] vMotion migration [a3e007a:1517445513843299] failed to read stream keepalive: Connection closed by remote host, possibly due to timeout
2018-02-01T00:40:48.495Z| vcpu-0| I125: [msg.checkpoint.precopyfailure] Migration to host <ip.add.re.ss> failed with error Connection closed by remote host, possibly due to timeout (0xbad003f).
2018-02-01T00:40:48.495Z| vcpu-0| I125: ----------------------------------------
2018-02-01T00:40:48.559Z| vcpu-0| I125: Vigor_MessageRevoke: message 'msg.checkpoint.precopyfailure' (seq 219224) is revoked
2018-02-01T00:40:48.559Z| Worker#1| I125: SVMotionCleanupThread: Waiting for SVMotion thread to complete.
2 questions:
What would be the underlying cause of the failure?
if no good reason for failure then how can i set the default migration to drop the VFRC contents?
First off, check if you have the sufficient amount of free space in target host's flash pool.
Double check your vFRC settings: VMware Knowledge Base
Also, don't forget that DRS will not migrate your vFRC-enabled VMs in most circumstances. DRS only moves such VMs in case of critical overprovisioning or "Maintenance mode" (despite it fails in your case).
Kabanossi, Thanks for the insight.
I'm migrating to and from hosts with more than enough available Flash resources to handle the VMs migration if i didn't the migration would fail before it started. My settings are correct. And DRS will migrate a vm when the host is going into maintenance mode which i did mention was the case in this issue. However none of this is relevant to my questions. i probably should have included a little more info in the post. I will update it.
rephrase of the questions:
what would cause a vFRC migration to fail mid migration and require a restart of vCenter and the host to correct the crash?
how do i make the default vFRC migration drop the cache?
thanks again for your thoughts, got any more?!?!?