VMware Cloud Community
dlogan
Contributor
Contributor

vMotion not working, cannot vmkping vMotion ports

I am having a problem with the vMotion interface, for some reason I cannot see the vMotion vswitch on the other machine (doesn't matter which machine in the cluster I am on) so as a result, when I try to migrate a VM, I get the following error message"

"The vMotion migrations failed because the ESX hosts were not able to connection over thevMotion network. Check the vMotion network settings and physical network configuration. vMotion migration [-1978010937:1299041522605750] failed to create a connection with remote host <10.0.0.234>: The ESX hosts failed to connection over the VMotion network Migration [-1978010937:1299041522605750] failed to connect to remote host <10.0.0.234>: Timeout"

The KB article states this could be because of network misconfiguration and gives some suggestions, which I've followed, but I cannot see what I've done wrong.

I have vSwitch3 set with a PortGroup of vMotion (this is identical on both machines) and using vmnic7. I can use vmkping to ping the local interface so I know that ESX server is listening on that IP/Port but can't vmkping the remote server (this is the same symptom on both servers). I can ping all points along the route to the other machine, eg: both gateways are available from the source host and from the destination host but once I move to vmkping, it shuts up shop on the other server.

I have one Service Console vSwitch which is shared with the VM's, one for ISCSI traffic, a separate Service Console and a vMotion switch. The servers are available via both Service Consoles and storage is visible in the vCentre server so the iSCSI vmkernel ports are working fine.

[root@uts-arcs-esx41-svr02 ~]# vmware -v
VMware ESX 4.1.0 build-260247
[root@uts-arcs-esx41-svr02 ~]# vmware -l
VMware ESX 4.1.0 GA

I have two networks setup 10.0.0.192/27 and 10.0.0.224/27, I've set it up according to the documentation that I could find.

[root@uts-arcs-esx41-svr02 ~]# esxcfg-vswif -l
Name     Port Group/DVPort   IP Family IP Address                              Netmask                                 Broadcast        Enabled   TYPE
vswif0   Service Console 192/27IPv4      10.0.0.219                          255.255.255.224                         10.0.0.223   true      STATIC
vswif1   Service Console 224/27IPv4      10.0.0.228                          255.255.255.224                         10.0.0.255   true      STATIC

[root@uts-arcs-esx41-svr02 ~]# esxcfg-vswitch -l
Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch0         128         7           128               1500    vmnic0,vmnic2,vmnic3

  PortGroup Name        VLAN ID  Used Ports  Uplinks
  VM Network            0        2           vmnic0,vmnic2,vmnic3
  Service Console 192/27  0        1           vmnic0,vmnic2,vmnic3

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch1         128         5           128               1500    vmnic1,vmnic5

  PortGroup Name        VLAN ID  Used Ports  Uplinks
  iSCSI2                0        1           vmnic5
  iSCSI1                0        1           vmnic1

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch2         128         3           128               1500    vmnic4

  PortGroup Name        VLAN ID  Used Ports  Uplinks
  Service Console 224/27  0        1           vmnic4

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch3         128         3           128               1500    vmnic7

  PortGroup Name        VLAN ID  Used Ports  Uplinks
  vMotion               0        1           vmnic7

[root@uts-arcs-esx41-svr02 ~]# esxcfg-nics -l
Name    PCI           Driver      Link Speed     Duplex MAC Address       MTU    Description
vmnic0  0000:01:00.00 bnx2        Up   1000Mbps  Full   b8:ac:6f:9a:55:3f 1500   Broadcom Corporation PowerEdge R710 BCM5709 Gigabit Ethernet
vmnic1  0000:01:00.01 bnx2        Up   1000Mbps  Full   b8:ac:6f:9a:55:41 1500   Broadcom Corporation PowerEdge R710 BCM5709 Gigabit Ethernet
vmnic2  0000:02:00.00 bnx2        Up   1000Mbps  Full   b8:ac:6f:9a:55:43 1500   Broadcom Corporation PowerEdge R710 BCM5709 Gigabit Ethernet
vmnic3  0000:02:00.01 bnx2        Up   1000Mbps  Full   b8:ac:6f:9a:55:45 1500   Broadcom Corporation PowerEdge R710 BCM5709 Gigabit Ethernet
vmnic4  0000:07:00.00 bnx2        Up   1000Mbps  Full   00:10:18:98:7b:e8 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic5  0000:07:00.01 bnx2        Up   1000Mbps  Full   00:10:18:98:7b:ea 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic6  0000:08:00.00 bnx2        Up   1000Mbps  Full   00:10:18:98:7b:ec 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic7  0000:08:00.01 bnx2        Up   1000Mbps  Full   00:10:18:98:7b:ee 1500   Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T

[root@uts-arcs-esx41-svr02 ~]# esxcfg-vmknic -l
Interface  Port Group/DVPort   IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type
vmk0       iSCSI1              IPv4      192.168.242.50                          255.255.255.0   192.168.242.255 00:50:56:79:44:ff 1500    65535     true    STATIC
vmk1       iSCSI2              IPv4      192.168.242.51                          255.255.255.0   192.168.242.255 00:50:56:7f:93:d9 1500    65535     true    STATIC
vmk2       vMotion             IPv4      10.0.0.199                          255.255.255.224 10.0.0.223  00:50:56:76:1a:b4 1500    65535     true    STATIC

[root@uts-arcs-esx41-svr02 ~]# esxcfg-route -l
VMkernel Routes:
Network          Netmask          Gateway          Interface
10.0.0.192        255.255.255.224  Local Subnet     vmk2
192.168.242.0    255.255.255.0    Local Subnet     vmk0
default          0.0.0.0          10.0.0.193   vmk2

[root@uts-arcs-esx41-svr02 ~]# ping 10.0.0.193
PING 10.0.0.193 (10.0.0.193) 56(84) bytes of data.
64 bytes from 10.0.0.193: icmp_seq=1 ttl=64 time=0.686 ms
64 bytes from 10.0.0.193: icmp_seq=2 ttl=64 time=0.586 ms

--- 10.0.0.193 ping statistics ---228 ms
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.586/0.636/0.686/0.050 ms

[root@uts-arcs-esx41-svr02 ~]# ping 10.0.0.225
PING 10.0.0.225 (10.0.0.225) 56(84) bytes of data.
64 bytes from 10.0.0.225: icmp_seq=1 ttl=64 time=5.39 ms
64 bytes from 10.0.0.225: icmp_seq=2 ttl=64 time=0.739 ms
64 bytes from 10.0.0.225: icmp_seq=3 ttl=64 time=0.604 ms

--- 10.0.0.225 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 0.604/2.247/5.398/2.228 ms

[root@uts-arcs-esx41-svr02 ~]# vmkping 10.0.0.199
PING 10.0.0.199 (10.0.0.199): 56 data bytes
64 bytes from 10.0.0.199: icmp_seq=0 ttl=64 time=0.056 ms
64 bytes from 10.0.0.199: icmp_seq=1 ttl=64 time=0.029 ms
64 bytes from 10.0.0.199: icmp_seq=2 ttl=64 time=0.039 ms

--- 10.0.0.199 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.029/0.041/0.056 ms

[root@uts-arcs-esx41-svr02 ~]# vmkping 10.0.0.234
PING 10.0.0.234 (10.0.0.234): 56 data bytes

--- 10.0.0.234 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

Any help or pointers would be most appreciated.

Thanks

David

Tags (4)
Reply
0 Kudos
4 Replies
ThompsG
Virtuoso
Virtuoso

Hi David,

Wowsa that's a complicated setup!

Looking through your configuration I have come up with the following information:

uts-arcs-esx41-svr02

vSwitch 0 - Service Console 1 (10.0.0.219/27) - vlan 0

vSwitch 1 - will ignore

vSwitch 2 - Service Console 2 (10.0.0.228/27) - vlan 0

vSwitch 3 - vmotion (10.0.0.199) - vlan 1

Now if I read the rest of the post correctly you have a second ESX server with, I'm assuming, a vmotion address of 10.0.0.234 - am I correct so far?

Some things you might want to look at is that your vmotion network is on the same subnet as your Service Console 1. This probably means you have the same gateway specified and therefore traffic is probably being routed over the Service Console 1 network since the vmotion address you are trying to connect to is in a different subnet. Then you have different vlans, i.e. none and vlan 1.

Any reason you cannot make the vmotion network a different network ID – say 192.168.1.0?

Kind regards.

Reply
0 Kudos
dlogan
Contributor
Contributor

Hi Glen,

Many thanks for the suggestions, I've made the vMotion VMK address in the same subnets as Service Console 1 on both machines. Yes, it is fairly complex but I need maximum redundancy due to data access requirements Smiley Happy Means it is fun getting it sorted out but I'm a bit stumped on this one.

Yes, the second server has the vMotion Port address of 10.0.0.234. This is vmkpingable (new word Smiley Happy) from the second server (10.0.0.229).

Both servers have a Service Console in each of the subnets allowing a complete switch/routing failure and still ensuring access to the Service Consoles.

I'll check the VLAN's as I decided to leave those as yet another layer of complexity and perhaps I've mucked that bit up. It is enough for me to look at the moment. Maybe when I've more experience with VMware I might use those Smiley Happy

Thanks and regards

David

Reply
0 Kudos
dlogan
Contributor
Contributor

Hi Glen,

The VLAN id's are all 0 on all portgroups and all vSwitches. I think the tabs make it a bit more difficult to read in the post.

Thanks

David

Reply
0 Kudos
karanbehl
Contributor
Contributor

Dear David,

As per the configuration in your post, we could identify the below vmkernel nic for vmotion has been created on the port group "Service Console 192/27".

vmk2       vMotion             IPv4      10.0.0.199

However, There is no such vmkernel nic created for vmotion for the network "Service Console 224/27" network.

Hence , you are unable to ping the below ip as per vmkping as there is no vmk nic created.

vmkping 10.0.0.234. (vmkping is specifically for vmkernel adapter).

But Still , you will be able to ping all the other IPs in the subnet "10.0.0.224/27" like

ping 10.0.0.225

PING 10.0.0.225 (10.0.0.225) 56(84) bytes of data.

64 bytes from 10.0.0.225: icmp_seq=1 ttl=64 time=5.39 ms

64 bytes from 10.0.0.225: icmp_seq=2 ttl=64 time=0.739 ms

64 bytes from 10.0.0.225: icmp_seq=3 ttl=64 time=0.604 ms

Please let me know for any queries and clarifications or If i am not able to comprehend the network properly. Thanks.

Regards,

Karan Behl

Reply
0 Kudos