VMware Cloud Community
brogio
Contributor
Contributor

Fake VmWare ESX 3.5 failed SVmotion migration ?

This morning I tried to SVmotion a VMachine, this is what I see in the VmWare ESX server logs:

(...)

Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.947 cpu1:1383)Migrate: 7556: 1234175457408539: Waiting to receive all changed pages before resuming...

Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.947 cpu1:1383)Migrate: 7577: 1234175457408539: Received all changed pages.

Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.947 cpu1:1383)World: vm 1391: 900: Starting world prefault-mig-1383 with flags 1

Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.947 cpu2:1391)Swap: vm 1383: 7929: Starting opportunistic swapper for the migration swap file

Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.948 cpu1:1383)Migrate: 3830: 1234175457408539: Resume handshake successful

Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.950 cpu3:1391)Swap: vm 1383: 8133: Finish swapping in migration swap file. (faulted in 0 pages). Success.

Feb 9 11:31:17 vmwareesx3 vmkernel: 2:23:51:49.491 cpu0:1265)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:31:18 vmwareesx3 vmkernel: 2:23:51:50.491 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=1394

Feb 9 11:31:18 vmwareesx3 vmkernel: 2:23:51:50.491 cpu3:1027)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:31:18 vmwareesx3 vmkernel: 2:23:51:50.541 cpu0:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:31:35 vmwareesx3 vmkernel: 2:23:52:06.621 cpu0:1304)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:31:35 vmwareesx3 vmkernel: 2:23:52:07.546 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=1360

Feb 9 11:31:35 vmwareesx3 vmkernel: 2:23:52:07.547 cpu3:1107)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:31:35 vmwareesx3 vmkernel: 2:23:52:07.598 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:32:00 vmwareesx3 vmkernel: 2:23:52:31.609 cpu0:1024)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:32:00 vmwareesx3 vmkernel: 2:23:52:32.592 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=1065

Feb 9 11:32:00 vmwareesx3 vmkernel: 2:23:52:32.593 cpu3:1027)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:32:01 vmwareesx3 vmkernel: 2:23:52:32.643 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:32:18 vmwareesx3 vmkernel: 2:23:52:50.351 cpu0:1367)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:32:19 vmwareesx3 vmkernel: 2:23:52:51.346 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=1848

Feb 9 11:32:19 vmwareesx3 vmkernel: 2:23:52:51.347 cpu3:1367)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:32:19 vmwareesx3 vmkernel: 2:23:52:51.397 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:32:35 vmwareesx3 vmkernel: 2:23:53:06.905 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=104

Feb 9 11:32:35 vmwareesx3 vmkernel: 2:23:53:06.905 cpu0:1035)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:35:45 vmwareesx3 vmkernel: 2:23:56:17.413 cpu0:1024)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:35:46 vmwareesx3 vmkernel: 2:23:56:18.349 cpu0:1066)<6>qla24xx_abort_command(0): handle to abort=1659

Feb 9 11:35:46 vmwareesx3 vmkernel: 2:23:56:18.349 cpu2:1318)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:35:46 vmwareesx3 vmkernel: 2:23:56:18.399 cpu1:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:39:36 vmwareesx3 vmkernel: 3:00:00:08.454 cpu0:1024)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:39:37 vmwareesx3 vmkernel: 3:00:00:09.401 cpu2:1066)<6>qla24xx_abort_command(0): handle to abort=775

Feb 9 11:39:37 vmwareesx3 vmkernel: 3:00:00:09.402 cpu2:1026)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:39:37 vmwareesx3 vmkernel: 3:00:00:09.452 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:39:52 vmwareesx3 vmkernel: 3:00:00:24.475 cpu0:1286)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:39:53 vmwareesx3 vmkernel: 3:00:00:25.454 cpu2:1066)<6>qla24xx_abort_command(0): handle to abort=509

Feb 9 11:39:53 vmwareesx3 vmkernel: 3:00:00:25.454 cpu2:1026)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:39:53 vmwareesx3 vmkernel: 3:00:00:25.504 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:40:44 vmwareesx3 vmkernel: 3:00:01:16.065 cpu0:1304)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:40:45 vmwareesx3 vmkernel: 3:00:01:17.072 cpu2:1066)<6>qla24xx_abort_command(0): handle to abort=1978

Feb 9 11:40:45 vmwareesx3 vmkernel: 3:00:01:17.073 cpu2:1274)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:40:45 vmwareesx3 vmkernel: 3:00:01:17.123 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:41:00 vmwareesx3 vmkernel: 3:00:01:32.187 cpu0:1024)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:41:01 vmwareesx3 vmkernel: 3:00:01:33.126 cpu2:1066)<6>qla24xx_abort_command(0): handle to abort=967

Feb 9 11:41:01 vmwareesx3 vmkernel: 3:00:01:33.127 cpu0:1035)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0

Feb 9 11:41:01 vmwareesx3 vmkernel: 3:00:01:33.177 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5

Feb 9 11:41:58 vmwareesx3 vmkernel: 3:00:02:30.219 cpu1:1382)DevFS: 2222: Unable to find device: 2073-DMotion-scsi0:00_test clone SSL Explorer-delta.vmdk

Feb 9 11:41:58 vmwareesx3 vmkernel: 3:00:02:30.536 cpu1:1382)VSCSI: 4060: Creating Virtual Device for world 1383 vscsi0:0 (handle 8240)

Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:30.637 cpu1:1382)DevFS: 2222: Unable to find device: 2f078-DMotion-scsi0:00_test clone SSL Explorer-delta.vmdk

Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:30.890 cpu2:1382)VSCSI: 4060: Creating Virtual Device for world 1383 vscsi0:0 (handle 8241)

Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:30.893 cpu7:1388)World: vm 1392: 900: Starting world vmware-vmx with flags 44

Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:31.377 cpu1:1382)DevFS: 2222: Unable to find device: c07e-DMotion-scsi0:00_test clone SSL Explorer-000001-delta.vmdk

Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:31.469 cpu1:1382)DevFS: 2222: Unable to find device: 18083-DMotion-scsi0:00_test clone SSL Explorer-000001-delta.vmdk

Feb 9 11:42:00 vmwareesx3 vmkernel: 3:00:02:31.718 cpu0:1382)DevFS: 2222: Unable to find device: 1089-DMotion-scsi0:00_test clone SSL Explorer-000001-delta.vmdk

Feb 9 11:42:00 vmwareesx3 vmkernel: 3:00:02:31.797 cpu0:1382)VSCSI: 4060: Creating Virtual Device for world 1383 vscsi0:0 (handle 8242)

(...)

About at 83% I see this:

(...)

Name: Reconfigure Virtual Machine

Target: test clone SSL Explorer

Initiated by: com.vmware.vnIntegrity

Status: A general systen error occurred: You must power off the virtual machine and complete the migration before invoking this operation

Related Events:

09/02/2009 11.40.17, Task: Reconfigure Virtual Machine

(...)

but the migration seems successfully.

Do you understand what happened here ?

0 Kudos
4 Replies
_David
Enthusiast
Enthusiast

If you look at the summary of the machine you will see that it hangs on two datastores. The failed vmotion makes it run on dmotion files on the new lun. To commit theese so you are back on the first lun.

Run on the ESX to get the UUID path that is used below "48f4b25a-389c3331-ce05-001cc477ac02"

vmware-cmd -l

Creates a snapshot

vmware-cmd /vmfs/volumes/48f4b25a-389c3331-ce05-001cc477ac02/machine/machine.vmx createsnapshot snapname snapdescription 1 1

Removes the snapshot

vmware-cmd /vmfs/volumes/48f4b25a-389c3331-ce05-001cc477ac02/machine/machine.vmx removesnapshots

Clears dmotion parent in the vmx file.

vmware-cmd /vmfs/volumes/48f4b25a-389c3331-ce05-001cc477ac02/machine/machine.vmx setconfig scsi0:0.DMotionParent ""

vmware-cmd /vmfs/volumes/48f4b25a-389c3331-ce05-001cc477ac02/machine/machine.vmx setconfig scsi0:1.DMotionParent ""

Then do a svmotion of vmx without moving the disks and you are back on the the first lun without rebooting the vm.

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points
brogio
Contributor
Contributor

Thank you very much!

0 Kudos
eric_heilig
Contributor
Contributor

I have had similar results when trying to sVmotion VMs. Unfortunately for me, the migrations DID NOT complete successfully. The VMDKs were copied but the VMX file continued to point at the original files.

In both instances, I was forced to power-off the VM in question and complete the migration offline. Further, the original VMDK files were left at their original storage location even after the migrations were completed successfully. Once I brought the VMs back online and verified that they were functioning as expected at the NEW location, I had to manually remove the original VMDK files from the datastore.

I am not exactly sure what caused the sVmotion to fail but can anecdotally attribute it to high CPU utilization. The following KB article seems to suggest that high CPU load on a VM can generate similar error messages and cause VMotion to fail. Likewise, both of my VMs were under heavy load at the time of migration failure.

Cheers!

Eric

0 Kudos
eric_heilig
Contributor
Contributor

Another interesting thing to note is that even after the migration "comleted successfully", the VMX file continued to point to the original storage location.

To resolve things, I had to remove and re-add the VMDKs at their NEW storage location.

Eric

0 Kudos