We have about 100 ESX servers. All with 6x NIC. All ESX servers are ESX 3.5 update 2.
The network setup is the same for all ESX servers:
vmnic0 en 2 Service Console + VMotion
vmnic1, 3, 4 en 5 VM Network
Both NIC-teams are based on Virtual Link ID and all vmnic's are Active. All settings are default.
By accident I look at the performance of the network and I see that all the traffic is going through vmnic1! There are about 30 V'M's on a host. I looked to the other hosts and at every host is the same situation:
PORT ID UPLINK USED BY DTYP DNAME PKTTX/s MbTX/s PKTRX/s MbRX/s %DRPTX %DRPRX
16777217 Y vmnic0 H vSwitch0 6.54 0.02 5.99 0.00 0.00 0.00
16777218 N 0:NCP H vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00
16777219 Y vmnic2 H vSwitch0 0.00 0.00 2.91 0.00 0.00 0.00
16777220 N 0:CDP H vSwitch0 0.00 0.00 0.00 0.00 0.00 0.00
16777221 N 0:vmk-tcpip-10.11.33 H vSwitch0 0.00 0.00 1.09 0.00 0.00 0.00
16777222 N 0:vswif0 H vSwitch0 6.54 0.02 4.18 0.00 0.00 0.00
33554433 N 0:CDP H vSwitch1 0.00 0.00 0.00 0.00 0.00 0.00
50331649 Y vmnic1 H vSwitch2 250.14 1.28 256.67 0.40 0.00 0.00
50331650 N 0:NCP H vSwitch2 0.00 0.00 0.00 0.00 0.00 0.00
50331651 Y vmnic3 H vSwitch2 0.00 0.00 68.85 0.16 0.00 0.00
50331652 Y vmnic4 H vSwitch2 0.00 0.00 68.85 0.16 0.00 0.00
50331653 Y vmnic5 H vSwitch2 0.00 0.00 68.85 0.16 0.00 0.00
50331654 N 0:CDP H vSwitch2 0.00 0.00 0.00 0.00 0.00 0.00
How is this possible? I thougt the first VM takes pNIC1, the 2nd pNIC2, etc.
When I look with ifconfig I see at vmnic3,4 en 5 (TX bytes 0 ???:
vmnic4 Link encap:Ethernet HWaddr 00:17:A4:77:20:38
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:502580341 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2450857820 (2337.3 Mb) TX bytes:0 (0.0 b)
Interrupt:121
The examples used are almost the same on every host.
is there anybody who has the same situation? Or an explanation for this performance counters?
If you read this article posted on Scott Lowe's blog you will see the exact issue you are seeing in your environment. In order to resolve the issue, you have some different options. Check out the "Without Aggregation" section.
Without LACP ouy can separate you VMs to separate port groups for better vmnic balancing. You can have multiple port group with different vmnic configurations for balancing and redundancy.
With LACP and using NIC teaming with "Route based on ip hash" and physical switches that been configured for EtherChannel or static 802.3ad/LACP
Hopefully this sheds some light on the issue.
Thanks for this answer, I already read that article. But in this article is the following:
+There is no guarantee that the VMs will be evenly distributed across
the uplinks on a vSwitch or port group. Although I’ve seen references
in the VMware documentation to indicate that VMs are balanced in some
fashion (I could not find those references, however), real-world
experience seems to indicate otherwise. In my tests, I had instances
where four VMs were all “linked” (via their virtual port ID) to the
same uplink.+
We
have about 100 hosts and at all NIC teams only one vmnic handles the
transmit traffic. I think this is not what VMware mentioned with this
option.
Route based on virtual port ID is the best way to achieve or
goal because we can use 4 switches. With a channel we can use only 1
switch (or with stacking 2 switches). This is not the situation we
want.
To seperate the VM's by using different portgroup's is a
nic option when the environment is small. Not in our environment
because there are too many ESX servers.