VMware Cloud Community
MoldRiteBud
Contributor
Contributor
Jump to solution

Can only use first available port on NIC for VM traffic

The server is question is an HP DL380/Gen 9, running version 6.0u2.  In addition to the onboard 4x 1GB and 2x 10Gb SPF+ ports, it has an Intel quad port NIC and an HP (broadcom) quad port NIC.  The host is part of a 3 node HA cluster, and was been working up until recently.  We upgraded our iSCSI network to 10Gb, and connected the previously unused on-board SPF+ ports to it.  Our configuration has vSwitch0 with 2 of the 1Gb ports for management/vMotion, vSwitch1 for iSCSI, and vSwitch2 for VM traffic.  They are all 'standard' vSwitches (not distributed).  There are no issues with vSwitch0 and vSwitch1.  Management and vMotion work fine, and all iSCSI devices are visible and accessible.

When I vMotion a machine to the host, it stops being able to pass traffic on the VM network.  Two VMs on the host can ping each other, but can ping nothing (including the default gateway) outside the host.  After a week of hair pulling, I've narrowed it down to this:  Effectively the vSwitch for VM traffic has ports (vmNics) from 3 physical adapters.  If I make only the first available port from each active on the vSwitch, VM traffic moves just fine.  If I add any other port from any of the physical adapters to the 'active' list, the VM's can no longer talk to the network.

For VM traffic, I have the following adapters / ports available:

Onboard 1GB:  vmNic 1, 2, 3 (0 used for management / vMotion)

Intel quad port: vmNic 6, 8, 9 (7 used for management / vMotion)

HP quad port: vmNic 10, 11, 12, 13

The only configuration that will pass VM traffic is to set vmNic 1, 6, and 10 as 'active' on the vSwitch, and leave the rest inactive.  Any other combination will not work.  It's maddening because this worked fine before lighting up the 10GB.  Taking that out of the mix and reverting to the previous configuration does NOT fix the problem. I'm inclined to think that there is some type obscure config file that got hosed or confused, but I cannot figure it out.

Sorry for the length, but wanted to present enough useful information.  Tanks in advance for any advice, before I employ the nuclear option and wipe and reload esxi.

Reply
0 Kudos
1 Solution

Accepted Solutions
MoldRiteBud
Contributor
Contributor
Jump to solution

Fixed the issue.  On the vSwitch, under 'teaming and fail over', I changed it from 'route based on originating virtual port' to 'route based on ip hash'

View solution in original post

Reply
0 Kudos
5 Replies
a_p_
Leadership
Leadership
Jump to solution

Just guessing.

Did you just replace the vmnics on the iSCSI vSwitch, or did you "destroy" the iSCSI configuration (port binding), and recreate it after replacing the vmnics?

André

Reply
0 Kudos
MoldRiteBud
Contributor
Contributor
Jump to solution

At first just replaced.  During trouble shooting over the past week, I did do a reset of the network data from the esxi console, and reconstruct everything from scratch, with no improvement.

Reply
0 Kudos
MoldRiteBud
Contributor
Contributor
Jump to solution

Fixed the issue.  On the vSwitch, under 'teaming and fail over', I changed it from 'route based on originating virtual port' to 'route based on ip hash'

Reply
0 Kudos
a_p_
Leadership
Leadership
Jump to solution

That's indeed a reason for what you see in case you are aggregating physical switch ports.

However, please keep in mind that not all aggregation options are available with standard vSwitches (see https://kb.vmware.com/kb/1004048)

André

Reply
0 Kudos
MoldRiteBud
Contributor
Contributor
Jump to solution

Fair enough, except that I am not using link aggregation.

Reply
0 Kudos