A strange thing happend today.
Our configuration is a 7 x HP blade center BL460C with ESX 3.5 and VC 2.5
HA and DRS are configured and tody at 11.10 AM our file server starts to vmotion true drs and created this error:
11:10 - A general system error occurred: Unknown failure migrating from another host
11:11 - Operation timed out
11:13 - A general system error occurred: invalid fault
After this I tried it manual. VC came with the following error
11:20 - The attempted operation cannot be perform in the current state (powered on)
I tried to powered it off. but that was not possible.
So I could not vmotion it manual and not powered it off because de server was OFF (strange)
Last I vmotion the other Vm's on that host to the other hosts, that worked! and after that I rebooted the ESX host.
After that The VM was locked, no other hosts was processing this server, I had to rebuild it and atached the disks.
That worked. 2 hours work and 2500 people could not work. Imagine!
This is what I could find in de vmkwarning log:
May 16 11:11:34 esx02 vmkernel: 16:18:11:39.529 cpu5:1118)WARNING: Migrate: 1242:
+1210929032662576: Failed: Failed to resume VM (0xbad0043) @0x9b6d3e++
May 16 11:38:57 esx02 vmkernel: 0:00:00:00.046 cpu4:1028)WARNING: Cpu: 521: version
+000006fb but BSP 000006f7++
May 16 11:38:57 esx02 vmkernel: 0:00:00:00.057 cpu5:1029)WARNING: Cpu: 521: version
+000006fb but BSP 000006f7++
May 16 11:38:57 esx02 vmkernel: 0:00:00:00.068 cpu6:1030)WARNING: Cpu: 521: version
+000006fb but BSP 000006f7++
May 16 11:38:57 esx02 vmkernel: 0:00:00:00.079 cpu7:1031)WARNING: Cpu: 521: version
+000006fb but BSP 000006f7++
May 16 11:38:58 esx02 vmkernel: 0:00:00:17.035 cpu2:1040)WARNING: ScsiScan: 319: Path
+'vmhba1:C0:T2:L0': Failed to get capacity information: Not supported++
+May 16
11:38:58 esx02 vmkernel: 0:00:00:17.046 cpu2:1040)WARNING: ScsiScan: 319: Path+
+'vmhba1:C0:T3:L0': Failed to get capacity information: Not supported++
this is the Vmkernel.1
May 16 11:10:57 esx02 vmkernel: 16:18:11:02.978 cpu3:1075)Migrate: vm 1076: 7332: Setting migration info ts = 1210929032662576, src ip = <172.16.6.2> dest i$
May 16 11:10:57 esx02 vmkernel: 16:18:11:02.979 cpu3:1075)World: vm 1117: 895: Starting world migSendHelper-1076 with flags 1
May 16 11:10:57 esx02 vmkernel: 16:18:11:02.979 cpu3:1075)World: vm 1118: 895: Starting world migRecvHelper-1076 with flags 1
May 16 11:10:57 esx02 vmkernel: 16:18:11:02.981 cpu2:1074)MigrateNet: vm 1074: 854: Accepted connection from <172.16.6.7>
May 16 11:11:24 esx02 vmkernel: 16:18:11:30.295 cpu6:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 55587 modified pages (last$
May 16 11:11:27 esx02 vmkernel: 16:18:11:32.828 cpu5:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 18311 modified pages (last$
May 16 11:11:28 esx02 vmkernel: 16:18:11:33.581 cpu6:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 10892 modified pages (last$
May 16 11:11:28 esx02 vmkernel: 16:18:11:34.086 cpu6:1076)Migrate: 7253: 1210929032662576: Stopping pre-copy: Not enough forward progress (Modified pages 10$
May 16 11:11:29 esx02 vmkernel: 16:18:11:34.564 cpu1:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19245056 v 7838, hb of$
May 16 11:11:29 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357265]
May 16 11:11:34 esx02 vmkernel: 16:18:11:39.529 cpu5:1118)WARNING: Migrate: 1242: 1210929032662576: Failed: Failed to resume VM (0xbad0043) @0x9b6d3e
May 16 11:11:34 esx02 vmkernel: 16:18:11:39.575 cpu1:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 14221312 v 90, hb offs$
May 16 11:11:34 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357274]
May 16 11:11:38 esx02 vmkernel: 16:18:11:43.695 cpu0:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19335168 v 271, hb off$
May 16 11:11:38 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]
May 16 11:11:42 esx02 vmkernel: 16:18:11:47.754 cpu3:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19331072 v 269, hb off$
May 16 11:11:42 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]
May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:0 (handle 8220)
May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:1 (handle 8221)
May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:2 (handle 8222)
May 16 11:11:46 esx02 vmkernel: 16:18:11:52.334 cpu3:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19331072 v 269, hb off$
May 16 11:11:46 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]
Our configuration is a 7 x HP blade center BL460C with ESX 3.5 and VC 2.5
HA and DRS are configured and tody at 11.10 AM our file server starts to vmotion true drs and created this error:
11:10 - A general system error occurred: Unknown failure migrating from another host
11:11 - Operation timed out
11:13 - A general system error occurred: invalid fault
After this I tried it manual. VC came with the following error
11:20 - The attempted operation cannot be perform in the current state (powered on)
I tried to powered it off. but that was not possible.
So I could not vmotion it manual and not powered it off because de server was OFF (strange)
Last I vmotion the other Vm's on that host to the other hosts, that worked! and after that I rebooted the ESX host.
After that The VM was locked, no other hosts was processing this server, I had to rebuild it and atached the disks.
That worked. 2 hours work and 2500 people could not work. Imagine!
This is what I could find in de vmkwarning log:
May 16 11:11:34 esx02 vmkernel: 16:18:11:39.529 cpu5:1118)WARNING: Migrate: 1242:
+1210929032662576: Failed: Failed to resume VM (0xbad0043) @0x9b6d3e++
May 16 11:38:57 esx02 vmkernel: 0:00:00:00.046 cpu4:1028)WARNING: Cpu: 521: version
+000006fb but BSP 000006f7++
May 16 11:38:57 esx02 vmkernel: 0:00:00:00.057 cpu5:1029)WARNING: Cpu: 521: version
+000006fb but BSP 000006f7++
May 16 11:38:57 esx02 vmkernel: 0:00:00:00.068 cpu6:1030)WARNING: Cpu: 521: version
+000006fb but BSP 000006f7++
May 16 11:38:57 esx02 vmkernel: 0:00:00:00.079 cpu7:1031)WARNING: Cpu: 521: version
+000006fb but BSP 000006f7++
May 16 11:38:58 esx02 vmkernel: 0:00:00:17.035 cpu2:1040)WARNING: ScsiScan: 319: Path
+'vmhba1:C0:T2:L0': Failed to get capacity information: Not supported++
+May 16
11:38:58 esx02 vmkernel: 0:00:00:17.046 cpu2:1040)WARNING: ScsiScan: 319: Path+
+'vmhba1:C0:T3:L0': Failed to get capacity information: Not supported++
this is the Vmkernel.1
May 16 11:10:57 esx02 vmkernel: 16:18:11:02.978 cpu3:1075)Migrate: vm 1076: 7332: Setting migration info ts = 1210929032662576, src ip = <172.16.6.2> dest i$
May 16 11:10:57 esx02 vmkernel: 16:18:11:02.979 cpu3:1075)World: vm 1117: 895: Starting world migSendHelper-1076 with flags 1
May 16 11:10:57 esx02 vmkernel: 16:18:11:02.979 cpu3:1075)World: vm 1118: 895: Starting world migRecvHelper-1076 with flags 1
May 16 11:10:57 esx02 vmkernel: 16:18:11:02.981 cpu2:1074)MigrateNet: vm 1074: 854: Accepted connection from <172.16.6.7>
May 16 11:11:24 esx02 vmkernel: 16:18:11:30.295 cpu6:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 55587 modified pages (last$
May 16 11:11:27 esx02 vmkernel: 16:18:11:32.828 cpu5:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 18311 modified pages (last$
May 16 11:11:28 esx02 vmkernel: 16:18:11:33.581 cpu6:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 10892 modified pages (last$
May 16 11:11:28 esx02 vmkernel: 16:18:11:34.086 cpu6:1076)Migrate: 7253: 1210929032662576: Stopping pre-copy: Not enough forward progress (Modified pages 10$
May 16 11:11:29 esx02 vmkernel: 16:18:11:34.564 cpu1:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19245056 v 7838, hb of$
May 16 11:11:29 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357265]
May 16 11:11:34 esx02 vmkernel: 16:18:11:39.529 cpu5:1118)WARNING: Migrate: 1242: 1210929032662576: Failed: Failed to resume VM (0xbad0043) @0x9b6d3e
May 16 11:11:34 esx02 vmkernel: 16:18:11:39.575 cpu1:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 14221312 v 90, hb offs$
May 16 11:11:34 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357274]
May 16 11:11:38 esx02 vmkernel: 16:18:11:43.695 cpu0:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19335168 v 271, hb off$
May 16 11:11:38 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]
May 16 11:11:42 esx02 vmkernel: 16:18:11:47.754 cpu3:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19331072 v 269, hb off$
May 16 11:11:42 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]
May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:0 (handle 8220)
May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:1 (handle 8221)
May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:2 (handle 8222)
May 16 11:11:46 esx02 vmkernel: 16:18:11:52.334 cpu3:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19331072 v 269, hb off$
May 16 11:11:46 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]