VMware Cloud Community
Iain
Contributor
Contributor
Jump to solution

VMOTION fail with error from VMkernal log

Hi All,

Attempting to VMOTION between 2 VI3 (3.01) Servers with VC 2.01

Validation is 100% successful between the servers, no CDROMS etc, same networks, shared SAN (MSA1500)

When i tail the VMkernal log, i see this error and the VMOTION fails at 10%:

Aug 24 14:16:44 esx1 vmkernel: 0:18:06:53.976 cpu3:1079)Migrate: vm 1080: 6838: Setting migration info ts = 4243996420, src ip = <10.130.17.134> dest ip = <10.130.17.138> Dest wid = 1267

Aug 24 14:16:44 esx1 vmkernel: 0:18:06:53.976 cpu3:1079)World: vm 1107: 690: Starting world migSendHelper-1080 with flags 1

Aug 24 14:16:44 esx1 vmkernel: 0:18:06:53.977 cpu3:1079)World: vm 1108: 690: Starting world migRecvHelper-1080 with flags 1

Aug 24 14:16:44 esx1 vmkernel: 0:18:06:53.979 cpu5:1069)MigrateNet: vm 1069: 763: Accepted connection from <10.130.17.138>

Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu2:1107)WARNING: MigrateNet: 196: 4243996420: 5-0x3fa0d008:Sent only 4088 of 4096 bytes of message data: Timeout

Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu2:1107)WARNING: Migrate: 6287: 4243996420: Couldn't send data for 7: Timeout

Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu2:1107)WARNING: Migrate: 1148: 4243996420: Failed: Timeout (0xbad0020) @0x8c0968

Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu2:1107)World: vm 1107: 3864: Killing self with status=0xbad0020:Timeout

Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu4:1108)World: vm 1108: 3864: Killing self with status=0x0:Success

I note it fails at sending 4088 bytes of data and never sends the full 4096.

Both systems are Gigabit connected and VMKPING works perfectly as does DNS resolution.

Any ideas?

Thanks

Reply
0 Kudos
1 Solution

Accepted Solutions
masaki
Virtuoso
Virtuoso
Jump to solution

I would put the same Vmotion port group on the same vSwitch with the same vmnic.

So for example:

Source as NOW

vSwitch3 vmnic1

Destination

vSwitch3 vmnic1

and not

vSwitch4 vmnic3

as now.

retry

View solution in original post

Reply
0 Kudos
10 Replies
masaki
Virtuoso
Virtuoso
Jump to solution

The HOSTs hardware is the same?

NX/XD bits are set?

CPUs from the same family and revision?

Usually when vmotion fails at 10% it doesn't start at all.

Is the vmotion VLAN trunked if you have implemented VLANs?

Is the vmkernel port group enabled?

Is the vmkernel port group default gateway correct?

Reply
0 Kudos
Iain
Contributor
Contributor
Jump to solution

Thanks for your speedy response...

Both peices of hardware are HPDL740's (one has 4 more CPUs and 20GB more RAM)

NX/XD are exposed, should these be hidden?

CPU's are both Intel Xeon MP 3Ghz (one host does NOT have hyperthreading on)

VMOTION Enabled on both, NO VLAN IDS

Both VMOTION ports are set on the same subnet (255.255.252.0) and have the same gateway address

It just flags back operation timed out with the error above.

Reply
0 Kudos
Iain
Contributor
Contributor
Jump to solution

Actually... both have Hyperthreading on. My mistake.

Reply
0 Kudos
masaki
Virtuoso
Virtuoso
Jump to solution

ReTry disabling NX/XD flags.

How many vSwitches?

You could try putting the vmkernel port group on the same vSwitch as the service console (usually this is not to do)

Reply
0 Kudos
Iain
Contributor
Contributor
Jump to solution

I have hidden the nx flags but no luck...

Aug 24 14:51:21 esx1 vmkernel: 0:18:41:31.223 cpu2:1079)Migrate: vm 1080: 6838: Setting migration info ts = 4243996 424, src ip = <10.130.17.134> dest ip = <10.130.17.138> Dest wid = 1291

Aug 24 14:51:21 esx1 vmkernel: 0:18:41:31.223 cpu2:1079)World: vm 1115: 690: Starting world migSendHelper-1080 with flags 1

Aug 24 14:51:21 esx1 vmkernel: 0:18:41:31.224 cpu2:1079)World: vm 1116: 690: Starting world migRecvHelper-1080 with flags 1

Aug 24 14:51:21 esx1 vmkernel: 0:18:41:31.226 cpu5:1069)MigrateNet: vm 1069: 763: Accepted connection from <10.130. 17.138>

Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.227 cpu3:1115)WARNING: MigrateNet: 196: 4243996424: 5-0x3fa0d5b0:Sent only 4088 of 4096 bytes of message data: Timeout

Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.227 cpu3:1115)WARNING: Migrate: 6287: 4243996424: Couldn't send data for 7: Timeout

Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.227 cpu3:1115)WARNING: Migrate: 1148: 4243996424: Failed: Timeout (0xbad0020) @0x8c0968

Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.227 cpu3:1115)World: vm 1115: 3864: Killing self with status=0xbad0020:Timeout

Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.228 cpu5:1116)World: vm 1116: 3864: Killing self with status=0x0:Success

Aug 24 15:04:53 esx1 vmkernel: 0:18:55:02.575 cpu6:1079)Migrate: vm 1080: 6838: Setting migration info ts = 4243996425, src ip = <10.130.17.134> dest ip = <10.130.17.138> Dest wid = 1295

Aug 24 15:04:53 esx1 vmkernel: 0:18:55:02.575 cpu6:1079)World: vm 1117: 690: Starting world migSendHelper-1080 with flags 1

Aug 24 15:04:53 esx1 vmkernel: 0:18:55:02.576 cpu6:1079)World: vm 1118: 690: Starting world migRecvHelper-1080 with flags 1

Aug 24 15:04:53 esx1 vmkernel: 0:18:55:02.578 cpu5:1069)MigrateNet: vm 1069: 763: Accepted connection from <10.130.17.138>

Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu5:1117)WARNING: MigrateNet: 196: 4243996425: 5-0x3fa0d008:Sent only 4088 of 4096 bytes of message data: Timeout

Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu5:1117)WARNING: Migrate: 6287: 4243996425: Couldn't send data for 7: Timeout

Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu5:1117)WARNING: Migrate: 1148: 4243996425: Failed: Timeout (0xbad0020) @0x8c0968

Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu5:1117)World: vm 1117: 3864: Killing self with status=0xbad0020:Timeout

Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu1:1118)World: vm 1118: 3864: Killing self with status=0x0:Success

I can try adding the vmotion port to the service console but connectivty appears to be perfect!?

Reply
0 Kudos
masaki
Virtuoso
Virtuoso
Jump to solution

Post this output:

esxcfg-vswitch -l

esxcfg-vmknic -l

esxcfg-nics -l

esxcfg-vswif -l

Message was edited by:

masaki

Iain
Contributor
Contributor
Jump to solution

ESXCFG-VSWITCH -l

DESTINATION BOX

\[root@esx02 root]# esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch0 64 6 64 vmnic5,vmnic1

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Corp Odd portgroup0 0 0 vmnic1,vmnic5

Legacy vmnic1 portgroup1 0 3 vmnic1,vmnic5

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch1 64 13 64 vmnic4

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Legacy vmnic4 portgroup3 0 6 vmnic4

100Mbps 10.130.1.x portgroup2 0 5 vmnic4

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch2 64 0 64

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Legacy vmnet_0 portgroup5 0 0

Network0 portgroup4 0 0

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch3 64 4 64 vmnic2

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Corp Even portgroup6 0 1 vmnic2

Legacy bond0 portgroup7 0 1 vmnic2

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch4 64 3 64 vmnic3

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

VMotion portgroup12 0 1 vmnic3

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch5 64 3 64 vmnic0

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Legacy eth0 portgroup10 0 1 vmnic0

SOURCE BOX

\[root@esx01 root]# esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch0 32 3 32 vmnic0

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

VM Network portgroup1 0 0 vmnic0

Service Console portgroup0 0 1 vmnic0

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch1 64 3 64 vmnic4,vmnic2

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Corp Odd portgroup3 0 0 vmnic2,vmnic4

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch2 64 5 64 vmnic6,vmnic5

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

Corp Even portgroup4 0 2 vmnic5,vmnic6

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch3 64 3 64 vmnic1

PortGroup Name Internal ID VLAN ID Used Ports Uplinks

VMotion portgroup5 0 1 vmnic1

ESXCFG-VMKNIC -l

SOURCE BOX:

Port Group IP Address Netmask Broadcast MAC Address MTU Enabled

VMotion 10.130.17.134 255.255.252.0 10.130.19.255 00:50:56:65:6c:22 1514 true

DESTINATION BOX:

Port Group IP Address Netmask Broadcast MAC Address MTU Enabled

VMotion 10.130.17.138 255.255.252.0 10.130.19.255 00:50:56:69:56:7e 1514 true

ESXCFG-NICS -l

SOURCE BOX:

Name PCI Driver Link Speed Duplex Description

vmnic0 01:04.00 tg3 Up 1000Mbps Full Broadcom Corporation NC7781 Gigabit Server Adapter (PCI-X, 10,100,1000-T)

vmnic1 01:05.00 tg3 Up 1000Mbps Full Broadcom Corporation NC7781 Gigabit Server Adapter (PCI-X, 10,100,1000-T)

vmnic2 03:01.00 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

vmnic3 03:01.01 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

vmnic4 0b:01.00 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

vmnic5 0b:01.01 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

vmnic6 0b:02.00 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

vmnic7 0b:02.01 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)

DESTINATION BOX:

Name PCI Driver Link Speed Duplex Description

vmnic0 01:04.00 tg3 Up 1000Mbps Full Broadcom Corporation NC7781 Gigabit Server Adapter (PCI-X, 10,100,1000-T)

vmnic1 01:05.00 tg3 Up 100Mbps Full Broadcom Corporation NC7781 Gigabit Server Adapter (PCI-X, 10,100,1000-T)

vmnic2 03:01.00 e1000 Up 100Mbps Full Intel Corporation NC7131 Gigabit Server Adapter

vmnic3 03:02.00 e1000 Up 1000Mbps Full Intel Corporation NC7131 Gigabit Server Adapter

vmnic5 0c:04.00 e100 Up 100Mbps Full Intel Corporation NC3134 Fast Ethernet NIC (dual port)

vmnic4 0c:05.00 e100 Up 100Mbps Full Intel Corporation NC3134 Fast Ethernet NIC (dual port)

ESXCFG-VSWIF -l

SOURCE BOX:

Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Service Console 10.130.17.170 255.255.252.0 10.130.19.255 true false

DESTINATION BOX:

Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Legacy eth0 10.130.1.129 255.255.252.0 10.130.3.255 true false

Reply
0 Kudos
masaki
Virtuoso
Virtuoso
Jump to solution

I would put the same Vmotion port group on the same vSwitch with the same vmnic.

So for example:

Source as NOW

vSwitch3 vmnic1

Destination

vSwitch3 vmnic1

and not

vSwitch4 vmnic3

as now.

retry

Reply
0 Kudos
Iain
Contributor
Contributor
Jump to solution

Well... that suprised me!! Worked... and i can now VMOTION across...

Many thanks.. can i award points?

Reply
0 Kudos
masaki
Virtuoso
Virtuoso
Jump to solution

Sure you can. You must! 😆

Click on the right of messages where you see Helpful and Correct button greyed.

See you on VMTN