VMware Cloud Community
athompson88
Enthusiast
Enthusiast

vmware 6.5 - NIC teaming not working after hardware upgrade

I just upgraded the hardware in my home/lab box to use a new motherboard and Intel I5-9600k. No issues during boot, and all vms came up fine. The only hardware difference is that now I'm using 2 PCIE NICs instead of 1 PCI and 1 PCIE (no PCI on the new motherboard). Previously, I could disconnect a wire from either without any interruption. Now it seems to be stuck to the original PCIE card. If I pull that cable, everything freezes (though I can still get a ping to the vmkernel NIC). The other cable can be pulled without interruption. Both NICs show as active and they are both members of vswitch0 (the only vswitch). I don't know that it would make a difference but I run ESXi of a USB thumb drive. All my storage is iSCSI from a NAS. ESXi 6.5.0 Update 2 (Build 8294253). Also the USB port is v3.1. I seem to remember ESXi having issues running off a 3.0 USB port previously, but that was back in 2014 when I first built this host.

Tags (2)
Reply
0 Kudos
17 Replies
ThompsG
Virtuoso
Virtuoso

Hi there,

Could you provide an output of the vSwitch configuration including any override at the portgroup level? Also what configuration do you have on the physical switch (if any and I’m not implying you require anything custom here)?

Kind regards.

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

Is there a command to provide an output, or do you just want a description?

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

I think I found what you are looking for.

[root@hawk:~] esxcfg-vswitch -l

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch0         2212        16          128               1500    vmnic1,vmnic0,vmnic2

  PortGroup Name        VLAN ID  Used Ports  Uplinks

  DMZ                   4        3           vmnic1,vmnic0,vmnic2

  Management Network    0        1           vmnic1,vmnic0,vmnic2

  VM Network            3        4           vmnic1,vmnic0,vmnic2

  ESXi Backend Network  3        1           vmnic1,vmnic0,vmnic2

[root@hawk:~] esxcli network vswitch standard list

vSwitch0

   Name: vSwitch0

   Class: etherswitch

   Num Ports: 2212

   Used Ports: 16

   Configured Ports: 128

   MTU: 1500

   CDP Status: listen

   Beacon Enabled: false

   Beacon Interval: 1

   Beacon Threshold: 3

   Beacon Required By:

   Uplinks: vmnic2, vmnic0, vmnic1

   Portgroups: DMZ, Management Network, VM Network, ESXi Backend Network

vmnic0: onboad ethernet (oddly enough it detected it and when connected it shows as active)

vmnic1: Old PCIE NIC

vmnic2: New PCIE NNIC

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

[root@hawk:~] esxcli network vswitch standard policy security get -v vSwitch0

   Allow Promiscuous: false

   Allow MAC Address Change: true

   Allow Forged Transmits: true

[root@hawk:~] esxcli network vswitch standard policy failover get -v vSwitch0

   Load Balancing: srcport

   Network Failure Detection: link

   Notify Switches: true

   Failback: true

   Active Adapters: vmnic1, vmnic0, vmnic2

   Standby Adapters:

   Unused Adapters:

[root@hawk:~] esxcli network vswitch standard policy shaping get -v vSwitch0

   Enabled: false

   Average Bandwidth: -1 Kbps

   Peak Bandwidth: -1 Kbps

   Burst Size: -1 Kib

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

More data:

[root@hawk:~] esxcli network nic list

Name    PCI Device    Driver  Admin Status  Link Status  Speed  Duplex  MAC Address         MTU  Description

------  ------------  ------  ------------  -----------  -----  ------  -----------------  ----  ------------------------------------------------

vmnic0  0000:00:1f.6  ne1000  Up            Down             0  Half    0c:9d:92:c1:36:a9  1500  Intel Corporation Ethernet Connection (7) I219-V

vmnic1  0000:03:00.0  ne1000  Up            Up            1000  Full    68:05:ca:27:7d:2a  1500  Intel Corporation Gigabit CT Desktop Adapter

vmnic2  0000:04:00.0  ne1000  Up            Up            1000  Full    68:05:ca:90:20:3a  1500  Intel Corporation Gigabit CT Desktop Adapter

[root@hawk:~] esxcli network nic get -n vmnic0

   Advertised Auto Negotiation: true

   Advertised Link Modes: Auto, 10BaseT/Half, 100BaseT/Half, 10BaseT/Full, 100BaseT/Full, 1000BaseT/Full

   Auto Negotiation: true

   Cable Type: Twisted Pair

   Current Message Level: -1

   Driver Info:

         Bus Info: 0000:00:1f:6

         Driver: ne1000

         Firmware Version: 0.5-4

         Version: 0.8.3

   Link Detected: false

   Link Status: Down

   Name: vmnic0

   PHYAddress: 0

   Pause Autonegotiate: false

   Pause RX: false

   Pause TX: false

   Supported Ports: TP

   Supports Auto Negotiation: true

   Supports Pause: false

   Supports Wakeon: true

   Transceiver:

   Virtual Address: 00:50:56:5f:69:c9

   Wakeon: MagicPacket(tm)

[root@hawk:~] esxcli network nic get -n vmnic1

   Advertised Auto Negotiation: true

   Advertised Link Modes: Auto, 10BaseT/Half, 100BaseT/Half, 10BaseT/Full, 100BaseT/Full, 1000BaseT/Full

   Auto Negotiation: true

   Cable Type: Twisted Pair

   Current Message Level: -1

   Driver Info:

         Bus Info: 0000:03:00:0

         Driver: ne1000

         Firmware Version: 1.8-0

         Version: 0.8.3

   Link Detected: true

   Link Status: Up

   Name: vmnic1

   PHYAddress: 0

   Pause Autonegotiate: false

   Pause RX: false

   Pause TX: false

   Supported Ports: TP

   Supports Auto Negotiation: true

   Supports Pause: false

   Supports Wakeon: true

   Transceiver:

   Virtual Address: 00:50:56:57:7d:2a

   Wakeon: MagicPacket(tm)

[root@hawk:~] esxcli network nic get -n vmnic2

   Advertised Auto Negotiation: true

   Advertised Link Modes: Auto, 10BaseT/Half, 100BaseT/Half, 10BaseT/Full, 100BaseT/Full, 1000BaseT/Full

   Auto Negotiation: false

   Cable Type: Twisted Pair

   Current Message Level: -1

   Driver Info:

         Bus Info: 0000:04:00:0

         Driver: ne1000

         Firmware Version: 1.8-0

         Version: 0.8.3

   Link Detected: true

   Link Status: Up

   Name: vmnic2

   PHYAddress: 0

   Pause Autonegotiate: false

   Pause RX: false

   Pause TX: false

   Supported Ports: TP

   Supports Auto Negotiation: true

   Supports Pause: false

   Supports Wakeon: true

   Transceiver:

   Virtual Address: 00:50:56:5b:f4:92

   Wakeon: MagicPacket(tm)

Reply
0 Kudos
ThompsG
Virtuoso
Virtuoso

Hi athompson88,

Thanks for that. When reading your post, you mention that after disconnecting vmnic1 (Old PCIE card) that the VMKernel port (ESXi management) still responds to pings - is that correct? That means ESXi management is still available but VMs running on the portgroup "VM Network" become unavailable?

If so then could you run the following and post the output:

esxcli network vswitch standard portgroup policy failover get -p "VM Network"

This is just to make sure the load balancing algorithm hasn't been overridden at the portgroup level.

Also on the "VM Network" portgroup, try changing the list of network adapters. Make the vmnic2 active and the other as unused. Do you still lose communications to the VMs?

Can you also confirm that on the pSwitch port, you are trunking the VLAN IDs 3 and 4 through? You must also have a native VLAN ID as the management portgroup is untagged so make sure the configuration between the physical ports is the same on the pSwitch.

Kind regards.

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

If I unplug the cable, the web console becomes unresponsive. I can still ping the ESXi host, and even SSH into it. Doing so allowed me to get the following output for you, which was taken while vmnic1 was unplugged:

[root@hawk:~] esxcli network vswitch standard portgroup policy failover get -p "VM Network"

   Load Balancing: srcport

   Network Failure Detection: link

   Notify Switches: true

   Failback: true

   Active Adapters: vmnic1, vmnic2, vmnic0

   Standby Adapters:

   Unused Adapters:

   Override Vswitch Load Balancing: false

   Override Vswitch Network Failure Detection: false

   Override Vswitch Notify Switches: false

   Override Vswitch Failback: false

   Override Vswitch Uplinks: false

To your second point, I did try disabling vmnic1 on the "VM Network" and was still able to communicate with the VMs. If I then disabled vmnic1 it cut all communication (as I'd expect). Also, let me clarify what my networks are.

Management Network: VLAN 0 (which is really 192.168.1.x) - This is used for network hardware management only, not the ESXi host (switches, router, AP).

DMZ: VLAN 4 - What it sounds like

VM Network: VLAN 3 - pivate server subnet

ESXi Backend Network: VLAN 3 - VMKernel port (vmk0)

The ports on the physical switch (if that's what you meant by pSwitch) are configured as untagged for VLAN 1, and tagged for VLANs 3 and 4. To rule out the physical switch I swapped the cables at the ends attached to vmnic1 and vmnic2. The same outcomes occurred.

Thanks.

Reply
0 Kudos
ThompsG
Virtuoso
Virtuoso

Hi there and thanks for the info!

So to confirm:

  • ESXi Backend Network is actually for your ESXi management, i.e. this is where the IP address is for the Host Client to connect to?
  • The VMs are unaffected by either disconnecting vmnic1 or vmnic2 - only ESXi management is impacted?

I'll continue to review your vSwitch config as well - cannot currently see anything wrong though am a little surprised that you have two portgroups which have the same VLAN ID connected to the same network adapters. No big deal, just surprised.

Also as vmnic0 is unused and disconnected, I would remove from the vSwitch or at least set to unused. Doubt this will resolve anything but it will be tidier that way Smiley Wink

Kind regards.

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

Disconnecting vmnic2 doesn't affect connectivity to the VM Network or the Management Network. Disconnecting vmnic1 affects connectivity to the Management Network as described (ping, SSH available, web console unresponsive) and seems to kill all access to everything on the VM Network.

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

I did a few more tests just now. Strangely enough, I can ping hosts on the VM Network as well, but can't SSH into them. If I SSH into the ESXi host, and then try to ping the hosts on the VM Network from there, that terminal session freezes and I can't even ctrl-c. I can open another session to the ESXi host though.  Terminal sessions will also freezes if I run any command specifically attempting to view datastores (for example "df", or "ls /vmfs/volumes").

Reply
0 Kudos
scriptermad
Enthusiast
Enthusiast

press alt and F12 on the esxi console while this issue is happening and you will see real time logging, it could help you to see if there are storage errors

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

I'll look into the logging.

More strange behavior. It seems that right after I disconnect vmnic1, all communications will work for about 5 seconds, and then begin having issues.

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

On a side note, I'm not particularly concerned with NIC failure. I mainly want the 2 cards in place to allow greater IO with the NAS which also has 2 gigabit NICs in it. Since we've already determined that both cards are being actively used, my primary needs are met. I am still interested in determining the issue from a purely academic standpoint.

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

Firstly, good thought on the storage errors because that seems to be exactly what's going on.

A quick update. I've decided to return the new PCIE card and just go with the onboard NIC in addition to the original PCIE. This will save me $40. Since the onboard behaves the same as the new PCIE in successful testing (maintaining connectivity when vmnic1 is removed from port groups leaving only vmnic0 to service them), and is also Intel, I feel confident it will do what I need. I used an ASUS TUF Z390-PRO Gaming motherboard for this, just in case reference is needed.

With that out of the way, I have attached 5 photos taken of the output while testing with physical connections.

01 - vmnic0 unplugged and plugged back in

02-04 - vmnic1 unplugged

05 - vmnic1 plugged back in

Thanks again for the help.

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

I went through the web console event log and noticed the following were generated.

"Lost path redundancy to storage device naa.6e843b67af803dad0e48d466fda02ad0. Path vmhba64:C0:T0:L0 is down. Affected datastores: iscsi-data1."

"Lost path redundancy to storage device naa.6e843b6fb6aea05de4a7d4157d8ae2d9. Path vmhba64:C1:T1:L0 is down. Affected datastores: iscsi-data3."

Seems like the path is going down with just the loss of 1 NIC. Does the software iSCSI adapter bind to a specific NIC? If so, then I wouldn't be using both NICs for iSCSI traffic after all.

Attached is a screenshot of the full event log.

Reply
0 Kudos
ThompsG
Virtuoso
Virtuoso

Hi athompson88,

Snap! I was about to suggest it might be a storage issue. Can you confirm that you have configured port binding as per this article: https://kb.vmware.com/s/article/2045040

This will give you multiple paths to storage, potentially increasing performance plus giving you redundancy Smiley Happy

Kind regards.

Reply
0 Kudos
athompson88
Enthusiast
Enthusiast

Well, it's better but still not working quite right. I created 2 new vmks specifically for iSCSI use and then set them up as per the document. So, here is what is happening now

1) Both NICs connected --> No issues

2) vmnic0 disconnected --> Connections lost

3) vmnic0 connected --> No issues

4) vmnic1 disconnected --> Connections lost

5) vmnic1 connected --> No issues

6) vmnic1 disconnected --> No issues  <--- here's where it gets weird

7) vmnic1 connected --> No issues

😎 vmnic0 disconnected --> Connections lost

9) vmnic0 connected --> No issues

10) vmnic0 disconnected --> No issues  <--- and again

11) vmnic0 connected --> No issues

12) vmnic1 disconnected --> Connections lost

13) vmnic1 connected --> No issues

14) vmnic1 disconnected --> No issues

15) vmnic1 connected --> No issues

To summarize, it seems things fail on the first disconnect of a given link, but then remain in tact if further disconnects of the same link occur. If the disconnect occurs on the other link connections fail the first time, but then similarly succeed during additional disconnects. And if you alternate back and forth with disconnects, failures occur every time. It's like it only knows how to recover from a failure of one given link at a time, and if a different link fails it has to relearn how to recover that one, but then forgets how to recover from the original.

Reply
0 Kudos