Performance NIC-Team - No load-balancing

TheButcher · ‎12-07-2008

We have about 100 ESX servers. All with 6x NIC. All ESX servers are ESX 3.5 update 2.

The network setup is the same for all ESX servers:

vmnic0 en 2 Service Console + VMotion

vmnic1, 3, 4 en 5 VM Network

Both NIC-teams are based on Virtual Link ID and all vmnic's are Active. All settings are default.

By accident I look at the performance of the network and I see that all the traffic is going through vmnic1! There are about 30 V'M's on a host. I looked to the other hosts and at every host is the same situation:

PORT ID UPLINK USED BY DTYP DNAME PKTTX/s MbTX/s PKTRX/s MbRX/s %DRPTX %DRPRX

16777217 Y vmnic0 H vSwitch0 6.54 0.02 5.99 0.00 0.00 0.00

16777218 N 0:NCP H vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00

16777219 Y vmnic2 H vSwitch0 0.00 0.00 2.91 0.00 0.00 0.00

16777220 N 0:CDP H vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00

16777221 N 0:vmk-tcpip-10.11.33 H vSwitch0 0.00 0.00 1.09 0.00 0.00 0.00

16777222 N 0:vswif0 H vSwitch0 6.54 0.02 4.18 0.00 0.00 0.00

33554433 N 0:CDP H vSwitch1 0.00 0.00 0.00 0.00 0.00 0.00

50331649 Y vmnic1 H vSwitch2 250.14 1.28 256.67 0.40 0.00 0.00

50331650 N 0:NCP H vSwitch2 0.00 0.00 0.00 0.00 0.00 0.00

50331651 Y vmnic3 H vSwitch2 0.00 0.00 68.85 0.16 0.00 0.00

50331652 Y vmnic4 H vSwitch2 0.00 0.00 68.85 0.16 0.00 0.00

50331653 Y vmnic5 H vSwitch2 0.00 0.00 68.85 0.16 0.00 0.00

50331654 N 0:CDP H vSwitch2 0.00 0.00 0.00 0.00 0.00 0.00

How is this possible? I thougt the first VM takes pNIC1, the 2nd pNIC2, etc.

When I look with ifconfig I see at vmnic3,4 en 5 (TX bytes 0 ???:

vmnic4 Link encap:Ethernet HWaddr 00:17:A4:77:20:38

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:502580341 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:2450857820 (2337.3 Mb) TX bytes:0 (0.0 b)

Interrupt:121

The examples used are almost the same on every host.

TheButcher · ‎12-08-2008

is there anybody who has the same situation? Or an explanation for this performance counters?

dmn0211 · ‎12-08-2008

If you read this article posted on Scott Lowe's blog you will see the exact issue you are seeing in your environment. In order to resolve the issue, you have some different options. Check out the "Without Aggregation" section.

Without LACP ouy can separate you VMs to separate port groups for better vmnic balancing. You can have multiple port group with different vmnic configurations for balancing and redundancy.

With LACP and using NIC teaming with "Route based on ip hash" and physical switches that been configured for EtherChannel or static 802.3ad/LACP

Hopefully this sheds some light on the issue.

TheButcher · ‎12-10-2008

Thanks for this answer, I already read that article. But in this article is the following:

+There is no guarantee that the VMs will be evenly distributed across

the uplinks on a vSwitch or port group. Although I’ve seen references

in the VMware documentation to indicate that VMs are balanced in some

fashion (I could not find those references, however), real-world

experience seems to indicate otherwise. In my tests, I had instances

where four VMs were all “linked” (via their virtual port ID) to the

same uplink.+

We

have about 100 hosts and at all NIC teams only one vmnic handles the

transmit traffic. I think this is not what VMware mentioned with this

option.

Route based on virtual port ID is the best way to achieve or

goal because we can use 4 switches. With a channel we can use only 1

switch (or with stacking 2 switches). This is not the situation we

want.

To seperate the VM's by using different portgroup's is a

nic option when the environment is small. Not in our environment

because there are too many ESX servers.