VMware Cloud Community
Teovmy
Contributor
Contributor

A general system error occurred: Unknown failure migrating from another host

A strange thing happend today.

Our configuration is a 7 x HP blade center BL460C with ESX 3.5 and VC 2.5

HA and DRS are configured and tody at 11.10 AM our file server starts to vmotion true drs and created this error:

11:10 - A general system error occurred: Unknown failure migrating from another host

11:11 - Operation timed out

11:13 - A general system error occurred: invalid fault

After this I tried it manual. VC came with the following error

11:20 - The attempted operation cannot be perform in the current state (powered on)

I tried to powered it off. but that was not possible.

So I could not vmotion it manual and not powered it off because de server was OFF (strange)

Last I vmotion the other Vm's on that host to the other hosts, that worked! and after that I rebooted the ESX host.

After that The VM was locked, no other hosts was processing this server, I had to rebuild it and atached the disks.

That worked. 2 hours work and 2500 people could not work. Imagine!

This is what I could find in de vmkwarning log:

May 16 11:11:34 esx02 vmkernel: 16:18:11:39.529 cpu5:1118)WARNING: Migrate: 1242:

1210929032662576: Failed: Failed to resume VM (0xbad0043) @0x9b6d3e+

May 16 11:38:57 esx02 vmkernel: 0:00:00:00.046 cpu4:1028)WARNING: Cpu: 521: version

000006fb but BSP 000006f7+

May 16 11:38:57 esx02 vmkernel: 0:00:00:00.057 cpu5:1029)WARNING: Cpu: 521: version

000006fb but BSP 000006f7+

May 16 11:38:57 esx02 vmkernel: 0:00:00:00.068 cpu6:1030)WARNING: Cpu: 521: version

000006fb but BSP 000006f7+

May 16 11:38:57 esx02 vmkernel: 0:00:00:00.079 cpu7:1031)WARNING: Cpu: 521: version

000006fb but BSP 000006f7+

May 16 11:38:58 esx02 vmkernel: 0:00:00:17.035 cpu2:1040)WARNING: ScsiScan: 319: Path

'vmhba1:C0:T2:L0': Failed to get capacity information: Not supported+

+May 16

11:38:58 esx02 vmkernel: 0:00:00:17.046 cpu2:1040)WARNING: ScsiScan: 319: Path+

'vmhba1:C0:T3:L0': Failed to get capacity information: Not supported+

this is the Vmkernel.1

May 16 11:10:57 esx02 vmkernel: 16:18:11:02.978 cpu3:1075)Migrate: vm 1076: 7332: Setting migration info ts = 1210929032662576, src ip = <172.16.6.2> dest i$

May 16 11:10:57 esx02 vmkernel: 16:18:11:02.979 cpu3:1075)World: vm 1117: 895: Starting world migSendHelper-1076 with flags 1

May 16 11:10:57 esx02 vmkernel: 16:18:11:02.979 cpu3:1075)World: vm 1118: 895: Starting world migRecvHelper-1076 with flags 1

May 16 11:10:57 esx02 vmkernel: 16:18:11:02.981 cpu2:1074)MigrateNet: vm 1074: 854: Accepted connection from <172.16.6.7>

May 16 11:11:24 esx02 vmkernel: 16:18:11:30.295 cpu6:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 55587 modified pages (last$

May 16 11:11:27 esx02 vmkernel: 16:18:11:32.828 cpu5:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 18311 modified pages (last$

May 16 11:11:28 esx02 vmkernel: 16:18:11:33.581 cpu6:1076)Migrate: 7258: 1210929032662576: Another pre-copy iteration needed with 10892 modified pages (last$

May 16 11:11:28 esx02 vmkernel: 16:18:11:34.086 cpu6:1076)Migrate: 7253: 1210929032662576: Stopping pre-copy: Not enough forward progress (Modified pages 10$

May 16 11:11:29 esx02 vmkernel: 16:18:11:34.564 cpu1:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19245056 v 7838, hb of$

May 16 11:11:29 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357265]

May 16 11:11:34 esx02 vmkernel: 16:18:11:39.529 cpu5:1118)WARNING: Migrate: 1242: 1210929032662576: Failed: Failed to resume VM (0xbad0043) @0x9b6d3e

May 16 11:11:34 esx02 vmkernel: 16:18:11:39.575 cpu1:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 14221312 v 90, hb offs$

May 16 11:11:34 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357274]

May 16 11:11:38 esx02 vmkernel: 16:18:11:43.695 cpu0:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19335168 v 271, hb off$

May 16 11:11:38 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]

May 16 11:11:42 esx02 vmkernel: 16:18:11:47.754 cpu3:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19331072 v 269, hb off$

May 16 11:11:42 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]

May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:0 (handle 8220)

May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:1 (handle 8221)

May 16 11:11:46 esx02 vmkernel: 16:18:11:52.320 cpu3:1075)VSCSI: 4059: Creating Virtual Device for world 1076 vscsi0:2 (handle 8222)

May 16 11:11:46 esx02 vmkernel: 16:18:11:52.334 cpu3:1075)FS3: 1974: Checking if lock holders are live for lock [type 10c00001 offset 19331072 v 269, hb off$

May 16 11:11:46 esx02 vmkernel: gen 756199, mode 1, owner 47390347-346927d0-9f6d-001e0b5eaec4 mtime 3357271]

Regards. @teovmy http://www.mikes.eu
Reply
0 Kudos
8 Replies
Teovmy
Contributor
Contributor

Is there nobody who have any idee in with direction to send me?

Regards. @teovmy http://www.mikes.eu
Reply
0 Kudos
christianZ
Champion
Champion

By critical behaviors the best is to open a case by VMware, I think.

Reply
0 Kudos
wgardiner
Hot Shot
Hot Shot

Can you please also post any errors from your hostd.log at around that time?

Reply
0 Kudos
kjb007
Immortal
Immortal

I have seen a similar issue when an error occurred during the vmotion process which caused the vm to end up with a corrupted memory. It put the vm that was being migrated into a limbo state where nothing could be done with it. In one case, the vm was fine, in another, it was down.

Instead of rebuilding. I restarted hostd, 'service mgmt-vmware restart', and then removed the vm from inventory, and then added the vm back in, and the vm was fine.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
Teovmy
Contributor
Contributor

@wgardiner,

This is the hostd.log from about that time

+--2008-05-16 11:11:27.289 'Memory checker' 3076440992 warning --Current value 156116 exceeds soft limit 122880.+

+--2008-05-16 11:11:28.174 'TaskManager' 126725040 info --Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-5837860+

+--2008-05-16 11:11:28.174 'TaskManager' 126725040 info --Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-5837860+

+--2008-05-16 11:11:28.196 'TaskManager' 20597680 info --Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-5837861+

+--2008-05-16 11:11:28.197 'TaskManager' 20597680 info --Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-5837861+

+--2008-05-16 11:11:57.301 'Memory checker' 114043824 warning --Current value 156116 exceeds soft limit 122880.+

+--2008-05-16 11:12:27.310 'Memory checker' 20863920 warning --Current value 156116 exceeds soft limit 122880.+

+--2008-05-16 11:12:57.321 'Memory checker' 85314480 warning --Current value 156116 exceeds soft limit 122880.+

+--2008-05-16 11:13:09.084 'EnvironmentBrowser' 65047472 info --Hw info file: /etc/vmware/hostd/hwInfo.xml+

+--2008-05-16 11:13:09.088 'EnvironmentBrowser' 65047472 info --Config target info loaded+

+--2008-05-16 11:13:27.332 'Memory checker' 105540528 warning --Current value 156116 exceeds soft limit 122880.+

+--2008-05-16 11:13:28.200 'TaskManager' 131337136 info --Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-5837941+

+--2008-05-16 11:13:28.200 'TaskManager' 131337136 info --Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-5837941+

+--2008-05-16 11:13:28.222 'TaskManager' 21392304 info --Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-5837942+

+--2008-05-16 11:13:28.222 'TaskManager' 21392304 info --Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-5837942+

+--2008-05-16 11:13:57.343 'Memory checker' 85314480 warning --Current value 156124 exceeds soft limit 122880.+

+--2008-05-16 11:14:08.529 'ha-eventmgr' 114043824 info --Event 523 : User root@127.0.0.1 logged in+

+--2008-05-16 11:14:10.054 'HTTP server /usr/lib/vmware/hostd/docroot/' 3076440992 warning --UnimplementedRequestHandler: HTTP method POST not supported for UR$+

+--2008-05-16 11:14:18.070 'ha-eventmgr' 66526128 info --Event 524 : User root logged out+

Regards. @teovmy http://www.mikes.eu
Reply
0 Kudos
Teovmy
Contributor
Contributor

kjb007, this you find the reason why this happend or not.

Regards. @teovmy http://www.mikes.eu
Reply
0 Kudos
kjb007
Immortal
Immortal

I believe it to have been a network issue, but can't say for sure, as I haven't been able to reproduce it. Once I got my VM back, I didn't go back and pursue it further, but yours appears to be more extreme since you had to rebuild the vm. I was able to recover mine fairly quickly.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
Karthik_VCP5
Enthusiast
Enthusiast

Hi ALL,

Im also recieving same error in our VDI Cluster host. tried below steps no success .. any idea

1. rebooted hostd services
2. rebooted vpxa services
3. restarted all management agents includes vpxa and hostd ( services.sh restart )
4. disconnected and reconnected the host back to VC
5. still no success
6  migrated the vm to another host and tried rebooting it worked
7  migrated back to the host and rebooted failed
8  Planning to reboot the host

Reply
0 Kudos