VMware Cloud Community
lansley2000
Contributor
Contributor
Jump to solution

vMotion fails at 9% - Source host cannot connect to destination host

Hi,

I wonder if anyone could shine any light on why vmotion fails to occur between an ESXi host that has just been restarted, in order to test HA. I have the following setup:

3 x ESXi5 DL360 G7s with two 4 way NICs

VSphere Vcenter 5

1 x Cluster configured for HA DRS

2 x Procurve 2910-24G

The switches are not connected to each other.

Both switches are configured as such:

1894901.png

All ports are untagged. NO STP, no routing

Each hosts vMotion vSwitch is connected to both switches by 1 x 1Gb nic

I have configured a vswitch on each host for vMotion. There are two vmkernel ports with two ip addresses in the same subnet. There are two vmnics attached to the vSwitch. On each port,  one Vmic is set to be active whilst the other is set to unused. I have enabled jumbo frames both on the vSwitch and the 2910 switches. A vlan has been configured on both 2910 swicthes for vmotion with jumbo frames and traffic set to ‘untagged’. I can successfully vmkping all vmotion ip’s on all ESXi hosts. However, when I test HA by shutting down an ESXi host, when I restart the ESXi host I am unable to vmotion to that ESXi host. When I test vmkping I find that the restarted host can only vmkping itself and no other host can vmkping it. The attempted vmotion fails on 9% and errors with the source host cannot connect to the destination host. If I restart both the 2910 switches I can then carry out a vMotion and the vmkping is successful.

Please help?

Thanks

Reply
0 Kudos
1 Solution

Accepted Solutions
rickardnobel
Champion
Champion
Jump to solution

lansley2000 wrote:

I've since read up on the various load balacing options and can see that your method is preferred over the 'ip hash' method

I'll make the change back to 'port based' and link the two switches

Hello Simon, I think that is a good option to do, since the IP Hash load balancing is a bit special and really demands that both interfaces connects to the same physical switch, which also must have some specific configuration. Report back if you like with the results after the new changes.

My VMware blog: www.rickardnobel.se

View solution in original post

Reply
0 Kudos
26 Replies
Virtualinfra
Commander
Commander
Jump to solution

Welcome to this community

Please put the screen shot of network configuration of esxi host.

Thanks & Regards Dharshan S VCP 4.0,VTSP 5.0, VCP 5.0
Reply
0 Kudos
BharatR
Hot Shot
Hot Shot
Jump to solution

Hi

Have a Look into this article for the vMotion Fails at 10%

http://kb.vmware.com/kb/1013150

http://kb.vmware.com/kb/1030845

Best regards, BharatR--VCP4-Certification #: 79230, If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
rickardnobel
Champion
Champion
Jump to solution

lansley2000 wrote:

2 x Procurve 2910-24G

The switches are not connected to each other.

Could you explain some more of the physical switch setup? You write above that the switches are not connected to each other, but how is the cabling setup from the switches into the hosts nics? This seems to be a likely cause of all kind of network communication problems.

Could you also run a "show vlan 20" on the switches?

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
lansley2000
Contributor
Contributor
Jump to solution

Hi Guys,

Thanks for responding

The ESXI Network config:

1894901_1.png

So vmnic2 will be active for vmk4 and vmnic5 will be unused

The reverse will bebtrue for vmk3

Jumbo frames are enabled

I'm unable to run show vlan 20, as I presently don't have a connection to it

iSCSISW1 (2910-24G) is connected to ESX5i hosts x 3 for vMotion

iSCSISW2 (2910-24G) is connected to ESX5i hosts x 3 for vMotion

So vmnic2 will be connected to iSCSISW1 and vmnic 5 will be connected to iSCSISw2

I've read that its not recommended to connect the two swicthes if an HP MSA SAN is being used

The strange thing here is that vMotion does work until you power down a host. On bringing the host back up it is no longer capable of vmkping'ing anyvMotion port apart from it own. If I reoot both swicthes vMotion operation resumes

Thanks again

Reply
0 Kudos
rickardnobel
Champion
Champion
Jump to solution

Hello,

the problem is quite sure the lack of connection between the switches.

You have setup the two vMotion VMK on each host to be able to use multi-nic-vMotion? Since there is no guarantee which remote IP the Vmkernel will connect to, this setup will work sometimes and sometimes not, as you have observed.

It is really important that all vMotion Vmkernel attach to the same layer two network, i.e. the same VLAN, and that all ports have the same connectivity. For what I understand from your setup the two physical switches are separated and the two Vmkernel on each host are attached to the two different switches?

What you must do is to create a tagged port for VLAN 20 on both switches and then physically connect them.

lansley2000 wrote:

I've read that its not recommended to connect the two swicthes if an HP MSA SAN is being used

Do you have some more information about what this recommendation was about, since it is a bit vague?

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
lansley2000
Contributor
Contributor
Jump to solution

Hi Richard,

Thanks for response

vMotion works 100% of the time, that is, until one host is restarted. Until that time all hosts can see all other hosts vMotion ip's

If I were to tag vLAN 20 and connect the two swicthes, would that not be the same as using only one vMotion port per NIC and having these on the same switch?

I've been told by a Storage specialist that when using the MSA 2000 G3, the swicthes used for iSCSI multi-pathing should not be connected

Thanks again

Simon

Reply
0 Kudos
rickardnobel
Champion
Champion
Jump to solution

lansley2000 wrote:

If I were to tag vLAN 20 and connect the two swicthes, would that not be the same as using only one vMotion port per NIC and having these on the same switch?

If you only had one vMotion Vmkernel connected out to one switch it would work, but you would have no failover if this switch would break. So the difference would be that with two physical switches you have greater fault tolerance and on ESXi 5 with multi-NIC-vMotion also greater performance.

lansley2000 wrote:

I've been told by a Storage specialist that when using the MSA 2000 G3, the swicthes used for iSCSI multi-pathing should not be connected

If there are some issues with connected switches for this specific SAN (which seems a little strange, but I do think the vendor should know best) this could still be achieved by just not tagging VLAN 10 on the interconnect port. The two switches would be "separated" from the perspective of the iSCSI network.

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
lansley2000
Contributor
Contributor
Jump to solution

Hi Richard,

I can link the two switches as you mention, but I don't think that would fix it.

Surely, if I'm using two switches with the three hosts connected to both, I should atleast be able to ping the other two hosts that have vMotion connections to that one switch i.e. should be able to ping 50% of vMotion ip's. At the moment I cannot ping any after a host restart

Thanks again

Simon

Reply
0 Kudos
rickardnobel
Champion
Champion
Jump to solution

vmotion-1.PNG

Do I understand your environment that the vMotion setup is like above? If so, from my perspective communication inside vlan 20 is not working predictable. Each node should be able to reach all other nodes inside a VLAN, and to achieve that you should connect the two switches with the vMotion VLAN tagged.

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
psplnalin123
Enthusiast
Enthusiast
Jump to solution

Just to isolate could you create 2 different switches one for vmotion network and other for iSCSI traffic ?? and then check vMotion is working or not

Reply
0 Kudos
lansley2000
Contributor
Contributor
Jump to solution

Hi Richard,

That is correct

I have each host connected to both switches, which should mean that I can ping each host and each vMotion NC on that host, which I can.

The troube then occurs after a host reboot and I'm unable to ping any vMotion NICs bar its itself

After rebooting a host, even with the two switches connected, won't I still suffer the same problem?

Thanks for helping me with this

Simno

Reply
0 Kudos
lansley2000
Contributor
Contributor
Jump to solution

Hello,

I already have a vswitch for iSCSI and a vswitch for vMotion

iSCSI seems to work fine

Thanks

Simon

Reply
0 Kudos
psplnalin123
Enthusiast
Enthusiast
Jump to solution

I mean to say in earlier port was that could you remove one vMotion Port group and check because if it will be successful then Richard was right in his post

Reply
0 Kudos
rickardnobel
Champion
Champion
Jump to solution

lansley2000 wrote:

I have each host connected to both switches, which should mean that I can ping each host and each vMotion NC on that host, which I can.

The troube then occurs after a host reboot and I'm unable to ping any vMotion NICs bar its itself

I think it is a bit of luck that it actually works from the beginning, since all vmkernels does not have access to all other vmkernels. If we call the vmkernels 1-4, then vmkernel 1 on the first host can only reach vmkernel 3 on the other, but not 4. Since it will not predicable which remote vmkernel the local vmkernel will communicate with (at least not if they all share the same IP subnet).

lansley2000 wrote:

After rebooting a host, even with the two switches connected, won't I still suffer the same problem?

From my networking point of view, the current setup is incorrect since all nodes in the VLAN can not directly reach all other nodes. If you connect the two switches then you will have a "correctly" configured VLAN and by then it should always work. Smiley Happy

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
lansley2000
Contributor
Contributor
Jump to solution

Hiya Richard,

I coming round to your way of thinking...

If I trunk the two switches will I then have to 'tag' the ports on the physical swicthes used by the vMotion Vlan?

Does it also mean I will have to specify the Vlan ID on the vMotion vSwitches?

Is there anything else you think I may have missed off?

Appreciate your ongoing help on this oneSmiley Happy

Cheers

Simon

Reply
0 Kudos
jose_maria_gonz
Virtuoso
Virtuoso
Jump to solution

Hi there,

Usually when vmotion fails at 10% has something to do with the vmkernel interface. Are you able to vmkping to your vmkernel ips?

I hope I have helped you out

My Company: http://www.jmgvirtualconsulting.com

My Blog: http://www.josemariagonzalez.es

My Web TV show: http://www.virtualizacion.tv

My linkedin: http://es.linkedin.com/in/jmgvirtualconsulting

My Twitter: http://twitter.com/jose_m_gonzalez

Reply
0 Kudos
rickardnobel
Champion
Champion
Jump to solution

lansley2000 wrote:

If I trunk the two switches will I then have to 'tag' the ports on the physical swicthes used by the vMotion Vlan?

Does it also mean I will have to specify the Vlan ID on the vMotion vSwitches?

Hello Simon,

if you are only using those switch ports for the vMotion interfaces then they could stay "untagged" without problems and you do not need to specify any VLAN on the Vmkernel portgroup.

Just select a unused suitable physical port on each switch and then run something like:

vlan 20

tag 24 (name of port)

and just to be more safe:

vlan 1

no untagged 24 (name of same port)

This will make this port only carry tagged frames for the vMotion network and nothing else.

My VMware blog: www.rickardnobel.se
Reply
0 Kudos
lansley2000
Contributor
Contributor
Jump to solution

Thanks Richard

Of course this will make vMotion reliant on both physical switches being up and running.

If one switch dies, the vMotion capability will be lost, no?

Cheers

Simon

Reply
0 Kudos
rickardnobel
Champion
Champion
Jump to solution

lansley2000 wrote:

Of course this will make vMotion reliant on both physical switches being up and running.

If one switch dies, the vMotion capability will be lost, no?

No, vMotion should still work. If one physical switch goes down then the vmknics attached to both hosts will "sense" the lost connectivity and not use it anymore, but the other interfaces to the surviving switch will be active and working.

My VMware blog: www.rickardnobel.se
Reply
0 Kudos