VMware Cloud Community
pmccready
Enthusiast
Enthusiast
Jump to solution

VMkernel / Virtual Network Issue

Hi there

Apologies for the lengthy post, my two questions are at the end if you want to skip the detail Smiley Wink

I've been doing some testing around virtual networking resiliency and have an issue with VMKernel traffic. Configuration as follows:

I have configured 2 virtual switches, 1 for Service Console and VMKernel traffic and a second for VMs. vSwitch0 (for SC and VMK) has 2 uplinks associated with it and VLAN tagging configured for the respective VLAN IDs. The two uplink ports are on different physical switches. The uplinks are plugged into physical ports configured as trunk ports.

For load balancing, vSwitch0 is using the default "Route based on the originating virtual port ID" and Beacon Probing for Network Failover Detection.

My problem is this - If I shutdown one of the uplink ports for vSwitch0, failover works fine, traffic continues to flow to the Service Console and VMKernel. When I fail back, immediately all traffic is fine but after 30 seconds or so a few ping packets are dropped but everything remains up and operational with no problems so everything seems to work ok.

However, if I shutdown one of the switches that the uplinks for vSwitch0 are plugged into, again failover is fine but when I power the switch up again, once it has finished its post and VMware sees the switch ports as "up", I lose connectivity to the Service Console and VMKernel networks. Connectivity is lost for about 30 - 50 seconds and is then restored. However, by this time all VMs have been shutdown because HA is set to shut down VMs if connectivity to a host is lost.

It looks as though VMware sees the ports are "up" before the physical switch ports are able to accept traffic.

My thinking is that when I shutdown a port and bring it up again, the time it takes the ports to realise they are back up after spanning tree converges on the physical switch is not long enough for the host to think it is down. But when powering up a switch (which is a Cisco Catalyst 6500 series by the way), the power up process takes longer and as the physical switch won't accept traffic at this point, the packets are dropped and the host appears "down".

Therefore I have two questions:

  1. Is there a timeout value set somewhere that says how long a host should be "down" before HA will power down the guests (assuming that's what the Isolation Response value is set to).

  2. Is it possible to stop VMware from forwarding traffic to an uplink port until the port is accepting data, i.e. after a switch has powered on, Spanning Tree converged etc...

Any advice much appreciated.

Reply
0 Kudos
1 Solution

Accepted Solutions
Texiwill
Leadership
Leadership
Jump to solution

Hello,

> 1. Is there a timeout value set somewhere that says how long a host should be "down" before HA will power down the guests (assuming that's what the Isolation Response value is set to).

Yes it is part of the HA Advanced options. Referenced here.

2. Is it possible to stop VMware from forwarding traffic to an uplink port until the port is accepting data, i.e. after a switch has powered on, Spanning Tree converged etc...

Yes, its the 'failback' setting on the Virtual Switch.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

View solution in original post

Reply
0 Kudos
5 Replies
Rubeck
Virtuoso
Virtuoso
Jump to solution

And spanning-tree portfast are enabled on all ESX link ports on the physical switch?

(or spanning-tree portfast trunk on trunk ports).

/Rubeck

Texiwill
Leadership
Leadership
Jump to solution

Hello,

> 1. Is there a timeout value set somewhere that says how long a host should be "down" before HA will power down the guests (assuming that's what the Isolation Response value is set to).

Yes it is part of the HA Advanced options. Referenced here.

2. Is it possible to stop VMware from forwarding traffic to an uplink port until the port is accepting data, i.e. after a switch has powered on, Spanning Tree converged etc...

Yes, its the 'failback' setting on the Virtual Switch.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
Reply
0 Kudos
pmccready
Enthusiast
Enthusiast
Jump to solution

Thank you both, the portfast trunk option does help things but from a networking point of view we'd prefer not to do this and configure it within VMware so I will do this using the advanced HA options.

Thanks again.

Reply
0 Kudos
Rubeck
Virtuoso
Virtuoso
Jump to solution

I know enabling portfast on Cisco switches for trunked uplinks, is for a network- admin point of view, not a good thing.... but remember that vSwitches can not be bridged within VMware (well, not true--- create a VM with bridging function and to vNICs connected to two vSwithes... Highly unlikely, but possible). vSwitches do not support or even know about STP..

Even Cisco says its the right thing to do... <-- this document is published at cisco.com also.

No need for a network to re-calculate STP topology, if not needed..

/Rubeck

Reply
0 Kudos
Texiwill
Leadership
Leadership
Jump to solution

Hello,

Actually the networking suggested by Rubeck is not a do or not thing. It is a MUST do with VMware ESX. VMware ESX has no clue about STP, does not participate in STP and STP without PortFast will cause all sorts of headaches. Like what you are seeing, which the other options may not fix. These problems will not go away until portfast is enabled when using STP. Portfast only needs to be enabled on the ports connected to your VMware ESX host. Until there is a vSwitch that supports STP (not from VMware will we see this), then it is your only real option.

If you are not using STP, it is not an issue.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
Reply
0 Kudos