VMware Cloud Community
SaPu
Contributor
Contributor

ESX disconnected

Hello all

My new vSphere infrastructure is connected to an etherchannel on cisco switches. The configuration of the channel is

interface port-channel1101

  description esx01

  switchport mode trunk

  vpc 1101

  switchport trunk allowed vlan 2,70

  spanning-tree port type edge trunk

  logging event port link-status

  logging event port trunk-status

Port Channel Load-Balancing Configuration:

System: source-dest-port

Port Channel Load-Balancing Addresses Used Per-Protocol:

Non-IP: source-dest-mac

IP: source-dest-port source-dest-ip source-dest-mac

I use one vSS with 8 vmnics attached and teaming set to IPHASH. As soon as I remove one/two vmnic from the vSS I loose the connection to the esx server. If I check on the cisco switch the removed vmnic is still on UP. The only way to get the esx connected is to reassign the removed vmnic to the vSS.

Has anyone an idea why this happens?

Regards.

SaPu

Reply
0 Kudos
7 Replies
RBurns-WIS
Enthusiast
Enthusiast

I see you're using 8 vmnics, and the Cisco config displays vpc 1101.  I assume you're using a Nexus 5000.  Please confirm along with the NXOS version you're running.

Post the config for the member interface? (Only the PC config is shown).

Post the output for "show vpc consistency-params int po 1101" from each N5K.

One thing to test - Can you take the vPC out of the picture?  Try connecting all your uplinks to a single N5K in a PC and see if the issue remains.

I don't hear of too many customers running 8 member vPCs.  It's supported, just not common.

Thanks,

Robert

Reply
0 Kudos
SaPu
Contributor
Contributor

Hello Rob, you're right we are using two Nexus 5020 switches. Unfortunately I have no access to the switches and the network admin is not in at the moment but I can give you the config that I already have. I hope this is what you're asking for:

1101  Po1101(SU)  Eth      NONE      Eth102/1/1(P)   Eth102/1/2(P)   Eth102/1/17(P)  Eth102/1/18(P)
1101  Po1101(SU)  Eth      NONE      Eth103/1/1(P)   Eth103/1/2(P)   Eth103/1/17(P)  Eth103/1/18(P)

Eth102

--------------------------------------------------------------------------------
Port           Name               Status   Vlan      Duplex  Speed   Type
--------------------------------------------------------------------------------
Eth102/1/1      esx01            up       trunk     full    1000    --        
Eth102/1/2      esx01            up       trunk     full    1000    --        
Eth102/1/17    esx01            up       trunk     full    1000    --        
Eth102/1/18    esx01            up       trunk     full    1000    --        

Eth103

--------------------------------------------------------------------------------
Port           Name               Status   Vlan      Duplex  Speed   Type
--------------------------------------------------------------------------------
Eth103/1/1      esx01           up       trunk     full    1000    --        
Eth103/1/2      esx01           up       trunk     full    1000    --        
Eth103/1/17    esx01           up       trunk     full    1000    --        
Eth103/1/18    esx01           up       trunk     full    1000    --        

Yesterday we put all the ports on one switch down - this helped to reconnect the esx.


For the other outputs I have to wait that the network admin is back.

Thanks.

SaPu

Reply
0 Kudos
RBurns-WIS
Enthusiast
Enthusiast

SaPu,

I'll need the interface config (show run int eth102/1/1).  The one interface should suffice - but do confirm that each member interface is configured identically.  When your Net admin gets in go ahead and post it here along with the other questions I requested (Switch software version etc).

In the meantime also detail the follow:

-Host NIC models/brand

-Driver version being use (ethtool -i vmnicx)

-ESX version & build

Tomorrow I'll whip this up in my lab and see if I can reproduce this issue.  Let me know any other relevant details to reproduce this issue. I try to recreate as close to your setup as I can.

Regards,

Robert

Reply
0 Kudos
SaPu
Contributor
Contributor

Hello Rob

Here we go:

show vpc consistency-params int po 1101

Legend:

Type 1 : vPC will be suspended in case of mismatch

Name        Type                      Local Value      Peer Value

------------- ---- ---------------------- -----------------------

STP Port   Type 1                   Normal Port      Normal Port

STP Port   Guard 1                 None                None

STP MST  Simulate PVST 1    Default             Default

mode        1                            on                   on

Speed       1                           1000 Mb/s        1000 Mb/s

Duplex      1                            full                   full

Port Mode 1                            trunk                trunk

Native Vlan 1 1 1

Shut Lan 1 No No

Allowed VLANs - 2,70

Local suspended VLANs - - -

Software

BIOS: version 1.2.0

loader: version N/A

kickstart: version 4.2(1)N1(1)

system: version 4.2(1)N1(1)

power-seq: version v1.2

BIOS compile time: 06/19/08

kickstart image file is: bootflash:/n5000-uk9-kickstart.4.2.1.N1.1.bin

kickstart compile time: 4/29/2010 19:00:00 [04/30/2010 04:38:04]

system image file is: bootflash:/n5000-uk9.4.2.1.N1.1.bin

system compile time: 4/29/2010 19:00:00 [04/30/2010 05:51:47]

Hardware

cisco Nexus5020 Chassis ("40x10GE/Supervisor")

Intel(R) Celeron(R) M CPU with 2074284 kB of memory.

Device name: switch1

bootflash: 1003520 kB

Kernel uptime is 20 day(s), 23 hour(s), 54 minute(s), 4 second(s)

Last reset at 350095 usecs after Thu Jan 13 12:20:27 2011

Reason: Reset Requested by CLI command reload

System version: 4.2(1)N1(1)

Service:

plugin

Core Plugin, Ethernet Plugin

show run int eth102/1/1

interface Ethernet102/1/1

description esx01

switchport mode trunk

switchport trunk allowed vlan 2,70

no snmp trap link-status

channel-group 1101

(same configuration for all interfaces)

Host NIC models/brand

Name     Driver   Description

vmnic0   bnx2    Broadcom Corporation PowerEdge M710 BCM5709S Gigabit Ethernet

vmnic1   bnx2    Broadcom Corporation PowerEdge M710 BCM5709S Gigabit Ethernet

vmnic2   bnx2    Broadcom Corporation PowerEdge M710 BCM5709S Gigabit Ethernet

vmnic3   bnx2    Broadcom Corporation PowerEdge M710 BCM5709S Gigabit Ethernet

vmnic4   bnx2    Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-SX

vmnic5   bnx2    Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-SX

vmnic6   bnx2    Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-SX

vmnic7   bnx2    Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-SX

Driver version being use (ethtool -i vmnicx)

ethtool -i vmnic0

driver: bnx2

version: 2.0.7d-2vmw

firmware-version: 5.2.2 NCSI 2.0.8

bus-info: 0000:01:00.0

ethtool -i vmnic1

driver: bnx2

version: 2.0.7d-2vmw

firmware-version: 5.2.2 NCSI 2.0.8

bus-info: 0000:01:00.1

ethtool -i vmnic2

driver: bnx2

version: 2.0.7d-2vmw

firmware-version: 5.2.2 NCSI 2.0.8

bus-info: 0000:02:00.0

ethtool -i vmnic3

driver: bnx2

version: 2.0.7d-2vmw

firmware-version: 5.2.2 NCSI 2.0.8

bus-info: 0000:02:00.1

ethtool -i vmnic4

driver: bnx2

version: 2.0.7d-2vmw

firmware-version: 5.2.2

bus-info: 0000:05:00.0

ethtool -i vmnic5

driver: bnx2

version: 2.0.7d-2vmw

firmware-version: 5.2.2

bus-info: 0000:05:00.1

ethtool -i vmnic6

driver: bnx2

version: 2.0.7d-2vmw

firmware-version: 5.2.2

bus-info: 0000:07:00.0

ethtool -i vmnic7

driver: bnx2

version: 2.0.7d-2vmw

firmware-version: 5.2.2

bus-info: 0000:07:00.1

ESX version & build

VMware ESX 4.1.0 Build 320092

Did I forget something? Hope not. If yes, please just let me know.

Thanks for your help.

Regards.

SaPU

Reply
0 Kudos
RBurns-WIS
Enthusiast
Enthusiast

SaPU,

First, you should be using "source-dest-ip" as the hash for your N5Ks.   Secondly, if you remove a NIC from the vSwitch, the N5K is not aware the link is down and will blackhole traffic.  This is expected behavior. Removing a NIC from a vSwitch without first downing the port on the N5K is not correct procedure.  There will always be a "link connected" as far as the N5K is concerned, and without any etherchannel negociation (such as LACP) the etherchannel's health is not monitored.

With all eight ports connected to the vSwitch and onilne with the N5K you can test failover/redundancy by either shutting the ports 1 by 1 on the N5K or just unplugging them.  All traffic will failover as expected.

If you need to purposely remove a NIC uplink from the vSwitch, you need to first down it on the N5K (allowing the N5K to redirect traffic to the other channel members) and then remove it from the port channel.

Let me know if you can test this successfully.

Regards,

Rob

Reply
0 Kudos
SaPu
Contributor
Contributor

Hey Rob

We already changed the hash to "source-dest-ip" for the N5K. You're right the only way to get failover/redundancy is to take down the port on the switch or unplug the cable and this works fine.

Does this mean that we have to wait that VMware supports dynamic LACP to have the fully feature?

Regards

SaPu

Reply
0 Kudos
RBurns-WIS
Enthusiast
Enthusiast

Correct - or Implement the Nexus 1000v Smiley Happy

Robert

Reply
0 Kudos