VMware Horizon Community
nimdach
Contributor
Contributor
Jump to solution

Error message "VC_FAULT_FATAL: Failed to receive migration" when provisioning Instant Clone after ver 7.5 & 7.6

Check the ESXI vmkernl.log that show below info

================================================

2018-10-22T21:55:36.682Z cpu4:2099462)Config: 703: "SIOControlFlag2" = 3, Old Value: 4, (Status: 0x0)

2018-10-22T21:55:36.841Z cpu1:2128369)FDS: 607: Enabling IO coalescing on driver 'deltadisks' device '3b128e-cp-parent-eb823241-7fef-4e91-a1f8-e46c8c1e899e_11-000001-sesparse.vmdk'

2018-10-22T21:55:36.881Z cpu1:2128369)FDS: 607: Enabling IO coalescing on driver 'deltadisks' device '3f1293-cp-parent-eb823241-7fef-4e91-a1f8-e46c8c1e899e_11-000001-digest-sesparse.vmdk'

2018-10-22T21:55:36.969Z cpu1:2128369)VSCSI: 3810: handle 8222(vscsi0:0):Creating Virtual Device for world 2128359 (FSS handle 4002450) numBlocks=67108864 (bs=512)

2018-10-22T21:55:36.976Z cpu1:2128369)cbrc_filter: CBRC_DigestLoad:3170: Loading digest with ID 201e 201e

2018-10-22T21:55:37.031Z cpu1:2128369)VSCSI: 273: handle 8222(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000

2018-10-22T21:55:37.031Z cpu1:2128369)Vmxnet3: 18168: Disable Rx queuing; queue size 512 is larger than Vmxnet3RxQueueLimit limit of 64.

2018-10-22T21:55:37.032Z cpu1:2128369)Vmxnet3: 18310: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 3 port 0x500100f

2018-10-22T21:55:37.032Z cpu1:2128369)Vmxnet3: 18552: Using default queue delivery for vmxnet3 for port 0x500100f

2018-10-22T21:55:37.032Z cpu1:2128369)Hbr: 3682: Migration end received (worldID=2128359) (migrateType=7) (event=1) (isSource=1) (sharedConfig=0)

2018-10-22T21:55:37.272Z cpu4:2099019)Config: 703: "SIOControlFlag2" = 2, Old Value: 3, (Status: 0x0)

2018-10-22T21:55:37.701Z cpu5:2099463 opID=d919beb2)World: 11942: VC opID 4514397a-01-a3-d6d2 maps to vmkernel opID d919beb2

2018-10-22T21:55:37.701Z cpu5:2099463 opID=d919beb2)Config: 703: "SIOControlFlag2" = 3, Old Value: 2, (Status: 0x0)

2018-10-22T21:55:37.748Z cpu6:2128356)Migrate: vm 2128359: 4082: Setting INSTANT_CLONE info: Source ts = 7610838939670523627, src ip = <127.0.0.1> dest ip = <127.0.0.1> Dest wid = 2134442 using UNSHARED swap, unencrypted

2018-10-22T21:55:37.748Z cpu6:2128356)InstantClone: 644: 7610838939670523627 S: Starting Instant Clone migration 7610838939670523627 with leader 2128359

2018-10-22T21:55:37.750Z cpu6:2128356)Hbr: 3588: Migration start received (worldID=2128359) (migrateType=7) (event=0) (isSource=1) (sharedConfig=0)

2018-10-22T21:55:37.751Z cpu2:2128369)VSCSI: 6602: handle 8222(vscsi0:0):Destroying Device for world 2128359 (pendCom 0)

2018-10-22T21:55:38.242Z cpu0:2128369)FDS: 607: Enabling IO coalescing on driver 'deltadisks' device '5912a0-cp-parent-eb823241-7fef-4e91-a1f8-e46c8c1e899e_11-000002-digest-sesparse.vmdk'

2018-10-22T21:55:38.533Z cpu1:2128369)InstantClone: 1171: 7610838939670523627 S: Checkpoint finished.

2018-10-22T21:55:38.538Z cpu1:2128369)InstantClone: 1210: 7610838939670523627 S: Ready for destination restore!

2018-10-22T21:55:38.597Z cpu1:2134231)WARNING: Migrate: 6267: 7610838939670523627 😧 Migration considered a failure by the VMX.  It is most likely a timeout, but check the VMX log for the true error.

2018-10-22T21:55:38.597Z cpu1:2134231)WARNING: Migrate: 282: 7610838939670523627 😧 Failed: Migration determined a failure by the VMX (0xbad0092) @0x41800e59ed3e

2018-10-22T21:55:38.597Z cpu1:2134231)WARNING: Migrate: 282: 7610838939670523627 S: Failed: Migration determined a failure by the VMX (0xbad0092) @0x41800f3059b2

2018-10-22T21:55:38.597Z cpu1:2134231)Hbr: 3682: Migration end received (worldID=2134442) (migrateType=7) (event=1) (isSource=0) (sharedConfig=0)

======================================================

Since the target VM be remove immediately after failed occurred. Any help are welcome

Tags (1)
1 Solution

Accepted Solutions
nimdach
Contributor
Contributor
Jump to solution

Due to create VM behavior change after upgraded ESXI host to vSphere 6.7.

Before vSphere 6.7, the configuration parameters "namespaceMgr.dbFile" is default created under Edit Settings -> VM options -> Advanced

After vSphere 6.7, there is no "namespaceMgr.dbFile" by default settings.

Case 1:

If user migration master image from vSphere 6.5 to vSphere 6.7 need to double checking the namespaceMgr.dbFile be migrated as well.

Case 2:

Rename the master image after created the VDI snapshot. it always got the issue due to namespaceMgr.dbFile inconsistent

here is vmware-0.log from cp-parent-xxx

------------------------------------------------------------------

2019-02-22T12:10:49.797Z| vmx| W115: InstantCloneReplicateFile: Source file path '/vmfs/volumes/5bf6a936-cc4b5753-ceab-0024e842aaff/cp-parent-11820eb1-bff8-4f27-937e-d51613ca77f7/GVM-W10.db' does not exist!

2019-02-22T12:10:49.825Z| vmx| I125: MigrateSetStateFinished: type=2 new state=12

2019-02-22T12:10:49.825Z| vmx| I125: MigrateSetState: Transitioning from state 11 to 12.

2019-02-22T12:10:49.825Z| vmx| I125: Migrate: Caching migration error message list:

2019-02-22T12:10:49.825Z| vmx| I125: [msg.checkpoint.migration.failedReceive] Failed to receive migration.

2019-02-22T12:10:49.826Z| vmx| I125: Migrate: cleaning up migration state.

--------------------------------------------------------------------

Before rename the vmx and vmsn file with namespaceMgr.dbFile => GVM-W10.db

After rename the master image to Win10RS5, the namespaceMgr.dbFile changed to Win10RS5.db on vmx file but vmsn still keep the GVM-W10.db not change.

As the result, the db file not copy to cp-parent-xxx and cause the problem.

View solution in original post

0 Kudos
6 Replies
nimdach
Contributor
Contributor
Jump to solution

Add more environment info for reference

1) ESXI 6.7 * 2

2) vCenter 6.7

3) Horizon view 7.5 and 7.6

4) Master template : Win10 RS3 + VMware tool 10.2.5 + Horizon view Agent 7.5

Also collect the normal vMotion log from vmkernl.log

=========================

018-10-23T03:58:19.785Z cpu2:2398971)Migrate: vm 2398972: 4082: Setting VMOTION info: Dest ts = 7610838961439697646, src ip = <10.201.186.218> dest ip = <10.201.186.219> Dest wid = 0 using SHARED swap, encrypted

2018-10-23T03:58:19.788Z cpu2:2398971)Hbr: 3588: Migration start received (worldID=2398972) (migrateType=1) (event=0) (isSource=0) (sharedConfig=1)

2018-10-23T03:58:19.817Z cpu1:2098342)MigrateNet: vm 2098342: 3254: Accepted connection from <10.201.186.218>

2018-10-23T03:58:19.817Z cpu1:2098342)MigrateNet: vm 2098342: 3342: dataSocket 0x430b8dd4f080 receive buffer size is 563272

2018-10-23T03:58:19.817Z cpu1:2098342)Migrate: 358: Remote machine is ESX 6.5 or newer.

2018-10-23T03:58:19.817Z cpu10:2398980)MigrateNet: 1751: 7610838961439697646 😧 Successfully bound connection to vmknic vmk3 - '10.201.186.219'

2018-10-23T03:58:19.817Z cpu1:2098342)MigrateNet: vm 2098342: 3254: Accepted connection from <10.201.186.218>

2018-10-23T03:58:19.817Z cpu1:2098342)MigrateNet: vm 2098342: 3342: dataSocket 0x430b8e0a3580 receive buffer size is 563272

2018-10-23T03:58:19.817Z cpu1:2098342)Migrate: 358: Remote machine is ESX 6.5 or newer.

2018-10-23T03:58:19.817Z cpu1:2098342)VMotionUtil: 5199: 7610838961439697646 😧 Stream connection 1 added.

2018-10-23T03:58:46.845Z cpu2:2398981)VMotionRecv: 761: 7610838961439697646 😧 Estimated network bandwidth 99.221 MB/s during pre-copy

2018-10-23T03:58:46.996Z cpu0:2398971)VSCSI: 3810: handle 8472(vscsi0:0):Creating Virtual Device for world 2398972 (FSS handle 10703114) numBlocks=83886080 (bs=512)

2018-10-23T03:58:46.996Z cpu0:2398971)VSCSI: 273: handle 8472(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000

2018-10-23T03:58:47.033Z cpu8:2398981)VMotionRecv: 2910: 7610838961439697646 😧 DONE paging in

2018-10-23T03:58:47.033Z cpu8:2398981)VMotionRecv: 2918: 7610838961439697646 😧 Estimated network bandwidth 110.593 MB/s during page-in

2018-10-23T03:58:47.053Z cpu9:2398972)VMMVMKCall: 243: Received INIT from world 2398972

========================

The normal vMotion get the right  IP address from Source and Destination vMotion TCP/IP stack

But Instant clone got the Soruce <127.0.0.1> and Destination <127.0.0.1> not sure the behavior is correct or bug from 7.5/7.6.

nimdach
Contributor
Contributor
Jump to solution

Re-config the cluster vSphere Availability and the issue gone. Verify on Horizon view 7.5 and 7.6 .

Disable the Admission Control from Cluter -> Configure -> vSphere Availability -> Edit vSphere HA .

The "VC_FAULT_FATAL" error message no longer display and VDI pool provisioning success.

nimdach
Contributor
Contributor
Jump to solution

Due to create VM behavior change after upgraded ESXI host to vSphere 6.7.

Before vSphere 6.7, the configuration parameters "namespaceMgr.dbFile" is default created under Edit Settings -> VM options -> Advanced

After vSphere 6.7, there is no "namespaceMgr.dbFile" by default settings.

Case 1:

If user migration master image from vSphere 6.5 to vSphere 6.7 need to double checking the namespaceMgr.dbFile be migrated as well.

Case 2:

Rename the master image after created the VDI snapshot. it always got the issue due to namespaceMgr.dbFile inconsistent

here is vmware-0.log from cp-parent-xxx

------------------------------------------------------------------

2019-02-22T12:10:49.797Z| vmx| W115: InstantCloneReplicateFile: Source file path '/vmfs/volumes/5bf6a936-cc4b5753-ceab-0024e842aaff/cp-parent-11820eb1-bff8-4f27-937e-d51613ca77f7/GVM-W10.db' does not exist!

2019-02-22T12:10:49.825Z| vmx| I125: MigrateSetStateFinished: type=2 new state=12

2019-02-22T12:10:49.825Z| vmx| I125: MigrateSetState: Transitioning from state 11 to 12.

2019-02-22T12:10:49.825Z| vmx| I125: Migrate: Caching migration error message list:

2019-02-22T12:10:49.825Z| vmx| I125: [msg.checkpoint.migration.failedReceive] Failed to receive migration.

2019-02-22T12:10:49.826Z| vmx| I125: Migrate: cleaning up migration state.

--------------------------------------------------------------------

Before rename the vmx and vmsn file with namespaceMgr.dbFile => GVM-W10.db

After rename the master image to Win10RS5, the namespaceMgr.dbFile changed to Win10RS5.db on vmx file but vmsn still keep the GVM-W10.db not change.

As the result, the db file not copy to cp-parent-xxx and cause the problem.

0 Kudos
LukaszDziwisz
Hot Shot
Hot Shot
Jump to solution

Hello,

I think we are hitting the same problem now and seeing a lot of Failed to Receive migration errors. So what is it that actually has to be done?

I usually build my GI and once I'm happy with it, I clone it to another machine, disable NIC,  log in to the new clone, rename it and remove it from the domain while it is offline. Then I power it on and add it back to the domain, customize it for different pool, take a snapshot and push to another pool.

If I go to advanced settings of the new VM I do see that the namespaceMgr.dbFile attribute is set with the original image db file. DO I just rename it here? Or do I have to edit vmx files as well?

0 Kudos
Levin1
Contributor
Contributor
Jump to solution

I also faced this issue in vSpere 6.5 U1 and am still facing it with 6.7 u1.

Mostly I can get around this by creating a new snapshot and configure a different LUN on the Horizon Pool and start a new rollout.

But I didn't find the root cause. Namespace file seems to be correct, admission control was already disabled.

Because VM-Log was disabled I was not able to check the log. I now enabled the VM-Log and check when the issue happen again.

Update:

I can see the same issue in my VM-Log. But I'm not sure how to solve it regarding your description.

0 Kudos
Levin1
Contributor
Contributor
Jump to solution

I checked with VMware Support. This is what I had to do:

Edit settings of your Master-VM --> VM Options --> advanced --> edit configuration --> remove value for "namespaceMgr.db"

After this change I successfully performed a push image operation.

0 Kudos