Re: nic teaming strange behaviour

vinny95 · ‎03-18-2015

Hi,

I have an esxi 5.5, and I wanted to test nic teaming for vmk management address.

I set a distributed vswitch on 2 physical nics (vmnic0 & vmnic1), each one on a different physical switch

A port group with vlan id, failback to No & load balancing to physical nic load

A management vmk on this port group

I shut one physical switch port (connected to one of the two vmnics :

esxi sees uplink down & lost redundancy, we lost one packet when pinging the esxi

2015-03-18T09:44:54.003Z [79980B70 info 'Vimsvc.ha-eventmgr'] Event 2002 : Lost uplink redundancy on DVPorts: "1738/08 09 23 50 9b 30 e6 89-3c 8e 65 73 a1 83 f1 7b", "1738/08 09 23 50 9b 30 e6 89-3c 8e 65 73 a1 83 f1 7b". Physical NIC vmnic1 is down.

2015-03-18T09:51:40.842Z cpu6:33515)<6>igb: vmnic1 NIC Link is Down

So my traffic is through vmnic0.

> I re-enable the uplink of vmnic1 on the physical switch : I expect nothing because I set failback to No and my traffic is already on vmnic0..

But I lose esxi connection for a few seconds,lose 7 to 10 packets when pinging esxi management address.

In the logs I see :

2015-03-18T09:47:09.003Z [799C1B70 info 'Vimsvc.ha-eventmgr'] Event 2004 : Uplink redundancy restored on DVPorts: "1738/08 09 23 50 9b 30 e6 89-3c 8e 65 73 a1 83 f1 7b", "1738/08 09 23 50 9b 30 e6 89-3c 8e 65 73 a

2015-03-18T09:47:11.965Z [799C1B70 info 'Vimsvc.ha-eventmgr'] Event 2005 : The dvPort 1738 link was down in the vSphere Distributed Switch in ha-datacenter

2015-03-18T09:47:11.966Z [799C1B70 info 'Vimsvc.ha-eventmgr'] Event 2006 : The dvPort 1738 was unblocked in the vSphere Distributed Switch in ha-datacenter.

2015-03-18T09:47:11.968Z [799C1B70 info 'Vimsvc.ha-eventmgr'] Event 2007 : The dvPort 1738 link was up in the vSphere Distributed Switch in ha-datacenter

(in vmkernel.log)

2015-03-18T09:52:11.753Z cpu2:33533)<6>igb: vmnic1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX

2015-03-18T09:52:12.954Z cpu0:32884)NetPort: 1632: disabled port 0x3000004

2015-03-18T09:52:12.954Z cpu0:32884)NetPort: 2905: resuming traffic on DV port 1738

2015-03-18T09:52:12.955Z cpu0:32884)Uplink: 6529: enabled port 0x3000004 with mac 0c:c4:7a:48:fe:ab

Is this behaviour "normal" ?

Why enabling one physical NIC in the team causes a lost of management network for 10 seconds ?

vinny

jrmunday · ‎03-18-2015

Hi Vinny,

How are your physical switches configured?

Do you have PortFast enabled on these ports to ensure that Spanning Tree is not causing issues?

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77

vinny95 · ‎03-18-2015

Hi Jon,

My physical switches are configured like this (cisco).

(the nic I disabled then re-enabled)

> vmnic1 :

interface GigabitEthernet1/x

description xx

switchport trunk encapsulation dot1q

switchport trunk allowed vlan xx

switchport mode trunk

logging event trunk-status

The other nic : vmnic0

interface GigabitEthernet1/x

description xxx

switchport trunk encapsulation dot1q

switchport trunk allowed vlan xx

switchport mode trunk

logging event link-status

logging event trunk-status

load-interval 30

storm-control broadcast level 33.00

no cdp enable

spanning-tree portfast

Looks like network team does not set the same configuration on each switches, could it be an issue ?

vinny

jrmunday · ‎03-18-2015

Hi Vinny,

I would certainly start with getting the ports setup the same, including enabling portfast (not currently configured on vmnic1 port).

Not related to your issue, but I would also get CDP enabled, so that you can see this port information in the virtual switch.

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77

vinny95 · ‎04-20-2015

I configured both port with same configuration and took a close look at esxi management mac addr.

mac @ is on switch A

shutdown port on switch A

> mac @ put on switch B, no packet lost, everything's OK

no shutdown for port on switch A

mac @ is seen on switch A & B, all packet lost (between 5 to 10), esxi not reachable

then mac @ only on switch A, esxi reachable

I triple checked the port group and failback is Off.

So why the mac @ tries to get on the previous switch ?

vinny

All

nic teaming strange behaviour