Re: Second NIC causes network outage when reconnec...

oss · ‎10-31-2006

Hi Folks

I have uncovered a strange issue with virtual switch NIC failover in ESX 3.01.

When testing NIC failover everthing works perfectly until the second NIC is reconnected then there is about a 20 second loss of connectivity to the host and VM's.

i.e.: Two NICS

disable physical switch port connected to vmnic0 - result no loss of connectivity

re enable switch port for vmnic0 - result no loss of connectivity

disable physical switch port connected to vmnic1 - result no loss of connectivity

re enable port for vmnic1 - result about a 20 second loss of connectivity to to service console and running vm's

This issue is identical on both hosts. Have tried using the 2 onboard NIC's and also using 1 onboard and 1 on a seperate PCI NIC card.

I have one vSwitch0 per ESX 3.01 host with the following configuration:

Failover and Load balancing

Load balancing: Port ID

Network Failure Detection: Link Status only

Notify Switches: Yes

Rolling No: No

Active adapters: vmnic0, vmnic1

Standby Adapters: None

Unused Adapters: None

The hosts are Dell 2850's with dual onboard 8254NXX and a seperate quad port 82546GB PCI network card

Any assistance with this would be much appreciated

Gabrie1 · ‎11-01-2006

Hi

I'm having about the same issue. At this moment I have placed a support call with vmware to find out what the best solution is and which ESX configuration to use.

\[url]http://www.vmware.com/community/thread.jspa?threadID=57562&tstart=0[/url]

Gabrie

http://www.GabesVirtualWorld.com

joepje · ‎12-14-2006

hi gabrie,

what is the current status of yar #SR.

I am experiencing the same:

failover works but failback cost me a few pings...

I am considering doing an active / standby configuration > we have tested this and it is working now.

MattMeyer · ‎12-14-2006

Is the switch port configured in any special way? It kinda sounds like the switch is freaking out when the port is reenabled; possibly something with the arp table. Is the switch managed and can you view any logs on the switch?

lambeth · ‎03-07-2007

usually this is because the physical switch port is not configured as "portfast" (or "portfast trunk" if you are doing vlan tagging on the vswitch) that's the Cisco terminology, for other switches look for whatever setting avoids going through the waiting states of spanning tree. There are a couple of other things you can tweak on the switch port to further trim down the time it takes to get the port into forwarding state, but I can't remember off the top of my head. Basically enabling portfast gets you down from around 30 seconds to around 2 seconds and the other things can shave off some of that last 2 but that's probably not necessary.

fyi, if this is in fact your problem, the reason for the behavior you see is that the switchport will give a positive link state almost immediately but won't forward any frame until after the spanning tree wait is over, usually around 30 seconds. But ESX attempts to start using the port as soon as it sees positive link state.

Message was edited by:

lambeth

acr · ‎03-07-2007

Thx Lambeth, had exactly this today...!!

gpare911 · ‎07-03-2007

I have this problem, but it seem to be only the vmkernel, not the sc ?

When I unplug, everything is OK, but when I replug, the service console work, but the vmkernel stop working.

When I look the switch, the mac-address-table is pointing on the wrong port, like if the vmkernel not sending the switch notify ?

Can anybody help me please ?

Thank

All

Second NIC causes network outage when reconnected