VMware Cloud Community
PreferredUser
Contributor
Contributor

Faster network throughput to virtual machine

I have done a lot of searching on NIC teaming/bonding/aggregating and have it setup on my host machine in the virtual switch, however there seems to be a lot of conflicting information on the value of teaming NICs in the virtual machine. I am trying to get faster network throughput to a couple of machines and am not having a lot of success in my searching.

So far I have found some tuning guides, but I cannot get the speeds I got when the server was running on its own hardware.

Any suggestions/solutions or even suggestions for search terms would be most appreciated.

Reply
0 Kudos
10 Replies
Erik_Zandboer
Expert
Expert

Hi,

first you should determine what you want to speed up. If you want to speed up networking for a single VM to a single ip address outside VMware, no teaming policy is going to help. In that case you'd have to manually assign multiple NICs to the VM and force them over spearate physical NICs.

If you wnat to speed up a number of VMs inside a single ESX server, you could get some result by just adding a few Gbit NICs to the vswitch. In the default policy (oroginating PORT ID), each VM would get one of the physical uplinks assigned. If you have more VMs than uplinks, they'll start to share the uplinks.

If you have a single server that has many sessions to different IPs, you could consider to create a port aggregate/etherchannel and create a team on the vsiwtch where the uplink used is based on "IP hash". In this mode ESX will hash the source and destination IP of each packet, and the hash result will determine the uplink to use. This way, a single VM can truly use multiple NICs. Remeber though, if you have a single target session this will not result in load balancing (since for each block the hash results of the IPs will be the same). If you have a backup VM that receives data from two external IPs, you could create an IP hash team with two uplinks, adding more uplinks will not help. Also, remember this: It could very well be, that when you have one source IP (the VM) and two destination IPs, the hash output might still be the same, resulting in only one uplink being used! Sometimes it takes a little "fiddling" with the IPs. If you have a hundred active sessions, this will almost always balance automagically (strength in numbers). In rare cases the target IPs might have been chosen in such a matter, that lbalancing still won't occur.



Visit my blog at http://www.vmdamentals.com

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
PreferredUser
Contributor
Contributor

Eric,

Thank you for the reply. Yes I know that speed cannot be increased if there is just one IP. In this case there are just under 100 field engineers that go out and collect data, when they return to the office they upload their data to the server and then their team works on the data, so there are a lot of individual IPs. The issue is when multiple teams are working in the office and someone comes in and uploads their work it slows everyone down.

There are only three VMs on the hardware and three 4-port NICs. I currently have two of the 4-port NICs dedicated to the vSwitch for that server so there should be plenty of bandwidth to the switch (with one management port and the rest for traffic). But does all the bandwidth at the switch translate to bandwidth to the VM? I am just trying to figure out if I need to add NICs to the VM. I read that there is a limit of 4 vNIC adapters per virtual machine so I may not need all the hardware NICs on the vSwitch, however I have not found any info on teaming the vNICs on the VM. I followed some of the tuning guides and have configured the VM to use the VMXNET driver rather than the E1000 driver, however if there is just one NIC on the VM and 8 on the vSwitch it would seem to bottleneck. However no matter what I try, when I add additional vNICs to the VM I cannot team them.

And yes the V-Switch is properly configured to communicate with the Cisco hardware through all the NICs (Link Aggregation & 802.1q).

So the issue is teaming the vNICs on the VM or how do I get the bandwidth from the vSwitch to the VM?

Reply
0 Kudos
Erik_Zandboer
Expert
Expert

Hm... You say that if someone is uploading everyone else is slowing down... This triggered me to think that maybe the storage underneath imposes the bottleneck... Verify the disk performance during such an upload, and double check the storage configuration on the box... Meaning how many spindles in what RAID setup etc.

If all 4 VMs work on the same set of disks, I would expect the I/O behaviour would be pretty much random. That in turn could heavily impact the upload performance (which is probably sequential in itself). This would cause all disks to start executing random IOPs, possibly at their maximum. That in turn would slow down the whole lot....

As with all performance issues, I always start of with browsing trough the performance graphs, and I trigger at any graph clipping at a certain level... For example, if during such an upload I see a disk writespeed of 50MB/s at a constant, and I know that a RAID1 set of two SATA drives is underneath I already have the answer... same goes for pretty much every graph that is clipping within vCenter.



Visit my blog at http://www.vmdamentals.com

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
PreferredUser
Contributor
Contributor

Disk I/O was one of the first areas that was addressed when this first started. The host is now connected to a direct attached storage device (12 15K drives in RAID 10) through multiple SCSI channels. The throughput there is blazingly fast. Much faster than when it was iSCSI SAN.

When I look at the performance logs, nothing ever really spikes other than the network traffic. Which keeps leading me back to a bottleneck between the vSwitch and the VM.

Reply
0 Kudos
Erik_Zandboer
Expert
Expert

If network spikes, disk IOPS should spike as well (during uploading). Also CPU should probably show up during the spikes.

The drives used should be able to outperform a fully saturated 1GBe link I guess... Maybe you have a lot of packet loss during the transfer? It must come from somewhere... I assume you have VMware Tools installed in the guest.



Visit my blog at http://www.vmdamentals.com

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
PreferredUser
Contributor
Contributor

IOPS rise but there is so much available throughput that it is not notable. CPU usage barely blips.

Yes I have the tools installed. Had to have that so I could use the VMXNET drivers.

Reply
0 Kudos
danm66
Expert
Expert

The virtual nic will run as fast as the physical hardware will let it, so nic teaming within a vm is not necesary or of use. In general, on a physical system the speed on a network adapter isn't limited by the driver...but by the chipset and protocols of the hardware. So, with a virtual machine, the virtual network adapter is only limited by the buffers and such within the driver or the underlying physical hardware. This is especially relevant with the 10GB modular NIC's where you dice up the bandwidth into virtual adapters.

Speaking of driver limitations, do you have the storage adapter type set to LSI Logic? Also, I have seen some systems run better with the e1000 nic type, but that was after updating the driver with the latest release from Intel for the OS.

I would be interested to see what esxtop reports for the network utilisation during these peak periods.

Reply
0 Kudos
mcowger
Immortal
Immortal

So you have 8 uplinks for the vSwitch into the physical switch - but how many uplinks does that physical switch have into the network your FE's are using? If its uplink is single gige or something, that could be your bottleneck....

I only mention because I missed this before.






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
PreferredUser
Contributor
Contributor

"Speaking of driver limitations, do you have the storage adapter type set to LSI Logic? Also, I have seen some systems run better with the e1000 nic type, but that was after updating the driver with the latest release from Intel for the OS.

I would be interested to see what esxtop reports for the network utilisation during these peak periods."

I will need to check on the storage adapter. I changed from e1000 to VMXNET based on several tuning guides. The NICs are running the latest release from Intel.

It will be a couple of days before I get back from my current assignment, however I will post the results from esxtop.

Reply
0 Kudos
PreferredUser
Contributor
Contributor

"So you have 8 uplinks for the vSwitch into the physical switch - but how many uplinks does that physical switch have into the network your FE's are using? If its uplink is single gige or something, that could be your bottleneck...."

The FEs are split between two switches, all switches are connected with StackWise (Cisco-speak) which has 128Gbps capacity.

Reply
0 Kudos