Hi,
I have an esxi 5.5, and I wanted to test nic teaming for vmk management address.
I set a distributed vswitch on 2 physical nics (vmnic0 & vmnic1), each one on a different physical switch
A port group with vlan id, failback to No & load balancing to physical nic load
A management vmk on this port group
I shut one physical switch port (connected to one of the two vmnics :
esxi sees uplink down & lost redundancy, we lost one packet when pinging the esxi
2015-03-18T09:44:54.003Z [79980B70 info 'Vimsvc.ha-eventmgr'] Event 2002 : Lost uplink redundancy on DVPorts: "1738/08 09 23 50 9b 30 e6 89-3c 8e 65 73 a1 83 f1 7b", "1738/08 09 23 50 9b 30 e6 89-3c 8e 65 73 a1 83 f1 7b". Physical NIC vmnic1 is down.
2015-03-18T09:51:40.842Z cpu6:33515)<6>igb: vmnic1 NIC Link is Down
So my traffic is through vmnic0.
> I re-enable the uplink of vmnic1 on the physical switch : I expect nothing because I set failback to No and my traffic is already on vmnic0..
But I lose esxi connection for a few seconds,lose 7 to 10 packets when pinging esxi management address.
In the logs I see :
2015-03-18T09:47:09.003Z [799C1B70 info 'Vimsvc.ha-eventmgr'] Event 2004 : Uplink redundancy restored on DVPorts: "1738/08 09 23 50 9b 30 e6 89-3c 8e 65 73 a1 83 f1 7b", "1738/08 09 23 50 9b 30 e6 89-3c 8e 65 73 a
2015-03-18T09:47:11.965Z [799C1B70 info 'Vimsvc.ha-eventmgr'] Event 2005 : The dvPort 1738 link was down in the vSphere Distributed Switch in ha-datacenter
2015-03-18T09:47:11.966Z [799C1B70 info 'Vimsvc.ha-eventmgr'] Event 2006 : The dvPort 1738 was unblocked in the vSphere Distributed Switch in ha-datacenter.
2015-03-18T09:47:11.968Z [799C1B70 info 'Vimsvc.ha-eventmgr'] Event 2007 : The dvPort 1738 link was up in the vSphere Distributed Switch in ha-datacenter
(in vmkernel.log)
2015-03-18T09:52:11.753Z cpu2:33533)<6>igb: vmnic1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
2015-03-18T09:52:12.954Z cpu0:32884)NetPort: 1632: disabled port 0x3000004
2015-03-18T09:52:12.954Z cpu0:32884)NetPort: 2905: resuming traffic on DV port 1738
2015-03-18T09:52:12.955Z cpu0:32884)Uplink: 6529: enabled port 0x3000004 with mac 0c:c4:7a:48:fe:ab
Is this behaviour "normal" ?
Why enabling one physical NIC in the team causes a lost of management network for 10 seconds ?
vinny
Hi Vinny,
How are your physical switches configured?
Do you have PortFast enabled on these ports to ensure that Spanning Tree is not causing issues?
Cheers,
Jon
Hi Jon,
My physical switches are configured like this (cisco).
(the nic I disabled then re-enabled)
> vmnic1 :
interface GigabitEthernet1/x
description xx
switchport trunk encapsulation dot1q
switchport trunk allowed vlan xx
switchport mode trunk
logging event trunk-status
The other nic : vmnic0
interface GigabitEthernet1/x
description xxx
switchport trunk encapsulation dot1q
switchport trunk allowed vlan xx
switchport mode trunk
logging event link-status
logging event trunk-status
load-interval 30
storm-control broadcast level 33.00
no cdp enable
spanning-tree portfast
Looks like network team does not set the same configuration on each switches, could it be an issue ?
vinny
Hi Vinny,
I would certainly start with getting the ports setup the same, including enabling portfast (not currently configured on vmnic1 port).
Not related to your issue, but I would also get CDP enabled, so that you can see this port information in the virtual switch.
Cheers,
Jon
I configured both port with same configuration and took a close look at esxi management mac addr.
mac @ is on switch A
shutdown port on switch A
> mac @ put on switch B, no packet lost, everything's OK
no shutdown for port on switch A
mac @ is seen on switch A & B, all packet lost (between 5 to 10), esxi not reachable
then mac @ only on switch A, esxi reachable
I triple checked the port group and failback is Off.
So why the mac @ tries to get on the previous switch ?
vinny