Hi All,
Attempting to VMOTION between 2 VI3 (3.01) Servers with VC 2.01
Validation is 100% successful between the servers, no CDROMS etc, same networks, shared SAN (MSA1500)
When i tail the VMkernal log, i see this error and the VMOTION fails at 10%:
Aug 24 14:16:44 esx1 vmkernel: 0:18:06:53.976 cpu3:1079)Migrate: vm 1080: 6838: Setting migration info ts = 4243996420, src ip = <10.130.17.134> dest ip = <10.130.17.138> Dest wid = 1267
Aug 24 14:16:44 esx1 vmkernel: 0:18:06:53.976 cpu3:1079)World: vm 1107: 690: Starting world migSendHelper-1080 with flags 1
Aug 24 14:16:44 esx1 vmkernel: 0:18:06:53.977 cpu3:1079)World: vm 1108: 690: Starting world migRecvHelper-1080 with flags 1
Aug 24 14:16:44 esx1 vmkernel: 0:18:06:53.979 cpu5:1069)MigrateNet: vm 1069: 763: Accepted connection from <10.130.17.138>
Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu2:1107)WARNING: MigrateNet: 196: 4243996420: 5-0x3fa0d008:Sent only 4088 of 4096 bytes of message data: Timeout
Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu2:1107)WARNING: Migrate: 6287: 4243996420: Couldn't send data for 7: Timeout
Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu2:1107)WARNING: Migrate: 1148: 4243996420: Failed: Timeout (0xbad0020) @0x8c0968
Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu2:1107)World: vm 1107: 3864: Killing self with status=0xbad0020:Timeout
Aug 24 14:17:04 esx1 vmkernel: 0:18:07:13.981 cpu4:1108)World: vm 1108: 3864: Killing self with status=0x0:Success
I note it fails at sending 4088 bytes of data and never sends the full 4096.
Both systems are Gigabit connected and VMKPING works perfectly as does DNS resolution.
Any ideas?
Thanks
I would put the same Vmotion port group on the same vSwitch with the same vmnic.
So for example:
Source as NOW
vSwitch3 vmnic1
Destination
vSwitch3 vmnic1
and not
vSwitch4 vmnic3
as now.
retry
The HOSTs hardware is the same?
NX/XD bits are set?
CPUs from the same family and revision?
Usually when vmotion fails at 10% it doesn't start at all.
Is the vmotion VLAN trunked if you have implemented VLANs?
Is the vmkernel port group enabled?
Is the vmkernel port group default gateway correct?
Thanks for your speedy response...
Both peices of hardware are HPDL740's (one has 4 more CPUs and 20GB more RAM)
NX/XD are exposed, should these be hidden?
CPU's are both Intel Xeon MP 3Ghz (one host does NOT have hyperthreading on)
VMOTION Enabled on both, NO VLAN IDS
Both VMOTION ports are set on the same subnet (255.255.252.0) and have the same gateway address
It just flags back operation timed out with the error above.
Actually... both have Hyperthreading on. My mistake.
ReTry disabling NX/XD flags.
How many vSwitches?
You could try putting the vmkernel port group on the same vSwitch as the service console (usually this is not to do)
I have hidden the nx flags but no luck...
Aug 24 14:51:21 esx1 vmkernel: 0:18:41:31.223 cpu2:1079)Migrate: vm 1080: 6838: Setting migration info ts = 4243996 424, src ip = <10.130.17.134> dest ip = <10.130.17.138> Dest wid = 1291
Aug 24 14:51:21 esx1 vmkernel: 0:18:41:31.223 cpu2:1079)World: vm 1115: 690: Starting world migSendHelper-1080 with flags 1
Aug 24 14:51:21 esx1 vmkernel: 0:18:41:31.224 cpu2:1079)World: vm 1116: 690: Starting world migRecvHelper-1080 with flags 1
Aug 24 14:51:21 esx1 vmkernel: 0:18:41:31.226 cpu5:1069)MigrateNet: vm 1069: 763: Accepted connection from <10.130. 17.138>
Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.227 cpu3:1115)WARNING: MigrateNet: 196: 4243996424: 5-0x3fa0d5b0:Sent only 4088 of 4096 bytes of message data: Timeout
Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.227 cpu3:1115)WARNING: Migrate: 6287: 4243996424: Couldn't send data for 7: Timeout
Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.227 cpu3:1115)WARNING: Migrate: 1148: 4243996424: Failed: Timeout (0xbad0020) @0x8c0968
Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.227 cpu3:1115)World: vm 1115: 3864: Killing self with status=0xbad0020:Timeout
Aug 24 14:51:41 esx1 vmkernel: 0:18:41:51.228 cpu5:1116)World: vm 1116: 3864: Killing self with status=0x0:Success
Aug 24 15:04:53 esx1 vmkernel: 0:18:55:02.575 cpu6:1079)Migrate: vm 1080: 6838: Setting migration info ts = 4243996425, src ip = <10.130.17.134> dest ip = <10.130.17.138> Dest wid = 1295
Aug 24 15:04:53 esx1 vmkernel: 0:18:55:02.575 cpu6:1079)World: vm 1117: 690: Starting world migSendHelper-1080 with flags 1
Aug 24 15:04:53 esx1 vmkernel: 0:18:55:02.576 cpu6:1079)World: vm 1118: 690: Starting world migRecvHelper-1080 with flags 1
Aug 24 15:04:53 esx1 vmkernel: 0:18:55:02.578 cpu5:1069)MigrateNet: vm 1069: 763: Accepted connection from <10.130.17.138>
Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu5:1117)WARNING: MigrateNet: 196: 4243996425: 5-0x3fa0d008:Sent only 4088 of 4096 bytes of message data: Timeout
Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu5:1117)WARNING: Migrate: 6287: 4243996425: Couldn't send data for 7: Timeout
Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu5:1117)WARNING: Migrate: 1148: 4243996425: Failed: Timeout (0xbad0020) @0x8c0968
Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu5:1117)World: vm 1117: 3864: Killing self with status=0xbad0020:Timeout
Aug 24 15:05:13 esx1 vmkernel: 0:18:55:22.579 cpu1:1118)World: vm 1118: 3864: Killing self with status=0x0:Success
I can try adding the vmotion port to the service console but connectivty appears to be perfect!?
Post this output:
esxcfg-vswitch -l
esxcfg-vmknic -l
esxcfg-nics -l
esxcfg-vswif -l
Message was edited by:
masaki
ESXCFG-VSWITCH -l
DESTINATION BOX
\[root@esx02 root]# esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch0 64 6 64 vmnic5,vmnic1
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
Corp Odd portgroup0 0 0 vmnic1,vmnic5
Legacy vmnic1 portgroup1 0 3 vmnic1,vmnic5
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch1 64 13 64 vmnic4
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
Legacy vmnic4 portgroup3 0 6 vmnic4
100Mbps 10.130.1.x portgroup2 0 5 vmnic4
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch2 64 0 64
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
Legacy vmnet_0 portgroup5 0 0
Network0 portgroup4 0 0
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch3 64 4 64 vmnic2
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
Corp Even portgroup6 0 1 vmnic2
Legacy bond0 portgroup7 0 1 vmnic2
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch4 64 3 64 vmnic3
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
VMotion portgroup12 0 1 vmnic3
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch5 64 3 64 vmnic0
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
Legacy eth0 portgroup10 0 1 vmnic0
SOURCE BOX
\[root@esx01 root]# esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch0 32 3 32 vmnic0
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
VM Network portgroup1 0 0 vmnic0
Service Console portgroup0 0 1 vmnic0
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch1 64 3 64 vmnic4,vmnic2
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
Corp Odd portgroup3 0 0 vmnic2,vmnic4
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch2 64 5 64 vmnic6,vmnic5
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
Corp Even portgroup4 0 2 vmnic5,vmnic6
Switch Name Num Ports Used Ports Configured Ports Uplinks
vSwitch3 64 3 64 vmnic1
PortGroup Name Internal ID VLAN ID Used Ports Uplinks
VMotion portgroup5 0 1 vmnic1
ESXCFG-VMKNIC -l
SOURCE BOX:
Port Group IP Address Netmask Broadcast MAC Address MTU Enabled
VMotion 10.130.17.134 255.255.252.0 10.130.19.255 00:50:56:65:6c:22 1514 true
DESTINATION BOX:
Port Group IP Address Netmask Broadcast MAC Address MTU Enabled
VMotion 10.130.17.138 255.255.252.0 10.130.19.255 00:50:56:69:56:7e 1514 true
ESXCFG-NICS -l
SOURCE BOX:
Name PCI Driver Link Speed Duplex Description
vmnic0 01:04.00 tg3 Up 1000Mbps Full Broadcom Corporation NC7781 Gigabit Server Adapter (PCI-X, 10,100,1000-T)
vmnic1 01:05.00 tg3 Up 1000Mbps Full Broadcom Corporation NC7781 Gigabit Server Adapter (PCI-X, 10,100,1000-T)
vmnic2 03:01.00 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
vmnic3 03:01.01 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
vmnic4 0b:01.00 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
vmnic5 0b:01.01 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
vmnic6 0b:02.00 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
vmnic7 0b:02.01 e1000 Up 100Mbps Full Intel Corporation 82546EB Gigabit Ethernet Controller (Copper)
DESTINATION BOX:
Name PCI Driver Link Speed Duplex Description
vmnic0 01:04.00 tg3 Up 1000Mbps Full Broadcom Corporation NC7781 Gigabit Server Adapter (PCI-X, 10,100,1000-T)
vmnic1 01:05.00 tg3 Up 100Mbps Full Broadcom Corporation NC7781 Gigabit Server Adapter (PCI-X, 10,100,1000-T)
vmnic2 03:01.00 e1000 Up 100Mbps Full Intel Corporation NC7131 Gigabit Server Adapter
vmnic3 03:02.00 e1000 Up 1000Mbps Full Intel Corporation NC7131 Gigabit Server Adapter
vmnic5 0c:04.00 e100 Up 100Mbps Full Intel Corporation NC3134 Fast Ethernet NIC (dual port)
vmnic4 0c:05.00 e100 Up 100Mbps Full Intel Corporation NC3134 Fast Ethernet NIC (dual port)
ESXCFG-VSWIF -l
SOURCE BOX:
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console 10.130.17.170 255.255.252.0 10.130.19.255 true false
DESTINATION BOX:
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Legacy eth0 10.130.1.129 255.255.252.0 10.130.3.255 true false
I would put the same Vmotion port group on the same vSwitch with the same vmnic.
So for example:
Source as NOW
vSwitch3 vmnic1
Destination
vSwitch3 vmnic1
and not
vSwitch4 vmnic3
as now.
retry
Well... that suprised me!! Worked... and i can now VMOTION across...
Many thanks.. can i award points?
Sure you can. You must! 😆
Click on the right of messages where you see Helpful and Correct button greyed.
See you on VMTN