This morning I tried to SVmotion a VMachine, this is what I see in the VmWare ESX server logs:
(...)
Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.947 cpu1:1383)Migrate: 7556: 1234175457408539: Waiting to receive all changed pages before resuming...
Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.947 cpu1:1383)Migrate: 7577: 1234175457408539: Received all changed pages.
Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.947 cpu1:1383)World: vm 1391: 900: Starting world prefault-mig-1383 with flags 1
Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.947 cpu2:1391)Swap: vm 1383: 7929: Starting opportunistic swapper for the migration swap file
Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.948 cpu1:1383)Migrate: 3830: 1234175457408539: Resume handshake successful
Feb 9 11:30:56 vmwareesx3 vmkernel: 2:23:51:27.950 cpu3:1391)Swap: vm 1383: 8133: Finish swapping in migration swap file. (faulted in 0 pages). Success.
Feb 9 11:31:17 vmwareesx3 vmkernel: 2:23:51:49.491 cpu0:1265)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:31:18 vmwareesx3 vmkernel: 2:23:51:50.491 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=1394
Feb 9 11:31:18 vmwareesx3 vmkernel: 2:23:51:50.491 cpu3:1027)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:31:18 vmwareesx3 vmkernel: 2:23:51:50.541 cpu0:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:31:35 vmwareesx3 vmkernel: 2:23:52:06.621 cpu0:1304)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:31:35 vmwareesx3 vmkernel: 2:23:52:07.546 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=1360
Feb 9 11:31:35 vmwareesx3 vmkernel: 2:23:52:07.547 cpu3:1107)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:31:35 vmwareesx3 vmkernel: 2:23:52:07.598 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:32:00 vmwareesx3 vmkernel: 2:23:52:31.609 cpu0:1024)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:32:00 vmwareesx3 vmkernel: 2:23:52:32.592 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=1065
Feb 9 11:32:00 vmwareesx3 vmkernel: 2:23:52:32.593 cpu3:1027)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:32:01 vmwareesx3 vmkernel: 2:23:52:32.643 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:32:18 vmwareesx3 vmkernel: 2:23:52:50.351 cpu0:1367)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:32:19 vmwareesx3 vmkernel: 2:23:52:51.346 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=1848
Feb 9 11:32:19 vmwareesx3 vmkernel: 2:23:52:51.347 cpu3:1367)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:32:19 vmwareesx3 vmkernel: 2:23:52:51.397 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:32:35 vmwareesx3 vmkernel: 2:23:53:06.905 cpu3:1066)<6>qla24xx_abort_command(0): handle to abort=104
Feb 9 11:32:35 vmwareesx3 vmkernel: 2:23:53:06.905 cpu0:1035)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:35:45 vmwareesx3 vmkernel: 2:23:56:17.413 cpu0:1024)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:35:46 vmwareesx3 vmkernel: 2:23:56:18.349 cpu0:1066)<6>qla24xx_abort_command(0): handle to abort=1659
Feb 9 11:35:46 vmwareesx3 vmkernel: 2:23:56:18.349 cpu2:1318)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:35:46 vmwareesx3 vmkernel: 2:23:56:18.399 cpu1:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:39:36 vmwareesx3 vmkernel: 3:00:00:08.454 cpu0:1024)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:39:37 vmwareesx3 vmkernel: 3:00:00:09.401 cpu2:1066)<6>qla24xx_abort_command(0): handle to abort=775
Feb 9 11:39:37 vmwareesx3 vmkernel: 3:00:00:09.402 cpu2:1026)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:39:37 vmwareesx3 vmkernel: 3:00:00:09.452 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:39:52 vmwareesx3 vmkernel: 3:00:00:24.475 cpu0:1286)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:39:53 vmwareesx3 vmkernel: 3:00:00:25.454 cpu2:1066)<6>qla24xx_abort_command(0): handle to abort=509
Feb 9 11:39:53 vmwareesx3 vmkernel: 3:00:00:25.454 cpu2:1026)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:39:53 vmwareesx3 vmkernel: 3:00:00:25.504 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:40:44 vmwareesx3 vmkernel: 3:00:01:16.065 cpu0:1304)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:40:45 vmwareesx3 vmkernel: 3:00:01:17.072 cpu2:1066)<6>qla24xx_abort_command(0): handle to abort=1978
Feb 9 11:40:45 vmwareesx3 vmkernel: 3:00:01:17.073 cpu2:1274)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:40:45 vmwareesx3 vmkernel: 3:00:01:17.123 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:41:00 vmwareesx3 vmkernel: 3:00:01:32.187 cpu0:1024)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:41:01 vmwareesx3 vmkernel: 3:00:01:33.126 cpu2:1066)<6>qla24xx_abort_command(0): handle to abort=967
Feb 9 11:41:01 vmwareesx3 vmkernel: 3:00:01:33.127 cpu0:1035)StorageMonitor: 196: vmhba1:0:1:0 status = 0/2 0x0 0x0 0x0
Feb 9 11:41:01 vmwareesx3 vmkernel: 3:00:01:33.177 cpu2:1037)Fil3: 5004: WRITE error 0xbad00e5
Feb 9 11:41:58 vmwareesx3 vmkernel: 3:00:02:30.219 cpu1:1382)DevFS: 2222: Unable to find device: 2073-DMotion-scsi0:00_test clone SSL Explorer-delta.vmdk
Feb 9 11:41:58 vmwareesx3 vmkernel: 3:00:02:30.536 cpu1:1382)VSCSI: 4060: Creating Virtual Device for world 1383 vscsi0:0 (handle 8240)
Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:30.637 cpu1:1382)DevFS: 2222: Unable to find device: 2f078-DMotion-scsi0:00_test clone SSL Explorer-delta.vmdk
Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:30.890 cpu2:1382)VSCSI: 4060: Creating Virtual Device for world 1383 vscsi0:0 (handle 8241)
Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:30.893 cpu7:1388)World: vm 1392: 900: Starting world vmware-vmx with flags 44
Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:31.377 cpu1:1382)DevFS: 2222: Unable to find device: c07e-DMotion-scsi0:00_test clone SSL Explorer-000001-delta.vmdk
Feb 9 11:41:59 vmwareesx3 vmkernel: 3:00:02:31.469 cpu1:1382)DevFS: 2222: Unable to find device: 18083-DMotion-scsi0:00_test clone SSL Explorer-000001-delta.vmdk
Feb 9 11:42:00 vmwareesx3 vmkernel: 3:00:02:31.718 cpu0:1382)DevFS: 2222: Unable to find device: 1089-DMotion-scsi0:00_test clone SSL Explorer-000001-delta.vmdk
Feb 9 11:42:00 vmwareesx3 vmkernel: 3:00:02:31.797 cpu0:1382)VSCSI: 4060: Creating Virtual Device for world 1383 vscsi0:0 (handle 8242)
(...)
About at 83% I see this:
(...)
Name: Reconfigure Virtual Machine
Target: test clone SSL Explorer
Initiated by: com.vmware.vnIntegrity
Status: A general systen error occurred: You must power off the virtual machine and complete the migration before invoking this operation
Related Events:
09/02/2009 11.40.17, Task: Reconfigure Virtual Machine
(...)
but the migration seems successfully.
Do you understand what happened here ?
If you look at the summary of the machine you will see that it hangs on two datastores. The failed vmotion makes it run on dmotion files on the new lun. To commit theese so you are back on the first lun.
Run on the ESX to get the UUID path that is used below "48f4b25a-389c3331-ce05-001cc477ac02"
vmware-cmd -l
Creates a snapshot
vmware-cmd /vmfs/volumes/48f4b25a-389c3331-ce05-001cc477ac02/machine/machine.vmx createsnapshot snapname snapdescription 1 1
Removes the snapshot
vmware-cmd /vmfs/volumes/48f4b25a-389c3331-ce05-001cc477ac02/machine/machine.vmx removesnapshots
Clears dmotion parent in the vmx file.
vmware-cmd /vmfs/volumes/48f4b25a-389c3331-ce05-001cc477ac02/machine/machine.vmx setconfig scsi0:0.DMotionParent ""
vmware-cmd /vmfs/volumes/48f4b25a-389c3331-ce05-001cc477ac02/machine/machine.vmx setconfig scsi0:1.DMotionParent ""
Then do a svmotion of vmx without moving the disks and you are back on the the first lun without rebooting the vm.
Thank you very much!
I have had similar results when trying to sVmotion VMs. Unfortunately for me, the migrations DID NOT complete successfully. The VMDKs were copied but the VMX file continued to point at the original files.
In both instances, I was forced to power-off the VM in question and complete the migration offline. Further, the original VMDK files were left at their original storage location even after the migrations were completed successfully. Once I brought the VMs back online and verified that they were functioning as expected at the NEW location, I had to manually remove the original VMDK files from the datastore.
I am not exactly sure what caused the sVmotion to fail but can anecdotally attribute it to high CPU utilization. The following KB article seems to suggest that high CPU load on a VM can generate similar error messages and cause VMotion to fail. Likewise, both of my VMs were under heavy load at the time of migration failure.
Cheers!
Eric
Another interesting thing to note is that even after the migration "comleted successfully", the VMX file continued to point to the original storage location.
To resolve things, I had to remove and re-add the VMDKs at their NEW storage location.
Eric