I have uncovered a strange issue with virtual switch NIC failover in ESX 3.01.
When testing NIC failover everthing works perfectly until the second NIC is reconnected then there is about a 20 second loss of connectivity to the host and VM's.
i.e.: Two NICS
disable physical switch port connected to vmnic0 - result no loss of connectivity
re enable switch port for vmnic0 - result no loss of connectivity
disable physical switch port connected to vmnic1 - result no loss of connectivity
re enable port for vmnic1 - result about a 20 second loss of connectivity to to service console and running vm's
This issue is identical on both hosts. Have tried using the 2 onboard NIC's and also using 1 onboard and 1 on a seperate PCI NIC card.
I have one vSwitch0 per ESX 3.01 host with the following configuration:
Failover and Load balancing
Load balancing: Port ID
Network Failure Detection: Link Status only
Notify Switches: Yes
Rolling No: No
Active adapters: vmnic0, vmnic1
Standby Adapters: None
Unused Adapters: None
The hosts are Dell 2850's with dual onboard 8254NXX and a seperate quad port 82546GB PCI network card
Any assistance with this would be much appreciated
I'm having about the same issue. At this moment I have placed a support call with vmware to find out what the best solution is and which ESX configuration to use.
what is the current status of yar #SR.
I am experiencing the same:
failover works but failback cost me a few pings...
I am considering doing an active / standby configuration > we have tested this and it is working now.
Is the switch port configured in any special way? It kinda sounds like the switch is freaking out when the port is reenabled; possibly something with the arp table. Is the switch managed and can you view any logs on the switch?
usually this is because the physical switch port is not configured as "portfast" (or "portfast trunk" if you are doing vlan tagging on the vswitch) that's the Cisco terminology, for other switches look for whatever setting avoids going through the waiting states of spanning tree. There are a couple of other things you can tweak on the switch port to further trim down the time it takes to get the port into forwarding state, but I can't remember off the top of my head. Basically enabling portfast gets you down from around 30 seconds to around 2 seconds and the other things can shave off some of that last 2 but that's probably not necessary.
fyi, if this is in fact your problem, the reason for the behavior you see is that the switchport will give a positive link state almost immediately but won't forward any frame until after the spanning tree wait is over, usually around 30 seconds. But ESX attempts to start using the port as soon as it sees positive link state.
Message was edited by:
Thx Lambeth, had exactly this today...!!
I have this problem, but it seem to be only the vmkernel, not the sc ?
When I unplug, everything is OK, but when I replug, the service console work, but the vmkernel stop working.
When I look the switch, the mac-address-table is pointing on the wrong port, like if the vmkernel not sending the switch notify ?
Can anybody help me please ?