I am running a low latency demanding application on a VM in VSphere 5.1. The TCP/IP transfer rate is 124 MBit. I have an intermittent problem related to the network dropping and then slowly increasing to full speed again.
I have attached a diagram of data gathered from the performance of the ESXi. The network speed drops from 124 MBbit to about 100Mbit.
I have followd this guide: http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf
Changing the network adapter from e1000 to VMXNET3 improved the performance. With e1000 I had this problem always, and now it only shows up 1 out of 10 times.
I have disable the virtual interupt coalescing and also LRO.
Are the VMware tools itself up to date? Any error on the physical switch side? This also might be a valid use case for VM DirectPath I/O whereas you can map a PCI device to a VM but you lose some vMotion (and other advanced) functionallity.
Yes, I believe it is the latest version of VMware tools. I was running initially with a straight TP cable connected directly to the equipment.
At the moment I am having a small not manageable switch in between the equipment and the server.
The ESXi is installed as a standalone server and is not part of a data center, so VMmotion is nothing that is needed in this case.
I noticed also that
Here is another example. The marker is place where the red circle is.
This diagram shows the data receive rate.
This diagram shows the receive packet drops. It is interesting that the data rate drops at the same time as the packet drops occur.
Here is the disk latency diagram.
I do have all the data collected from the net, mem, disk, cpu and system parts of the ESXi performance.
The CPU peaks at about 360MHZ per core.
In my opinion, it seems like it is the network that is the problem, but I could be wrong.
There are many ESXi/ESX host components that can contribute to network performance.
Validate that each troubleshooting step below is true for your environment. The steps provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
If your problem still exists after trying the steps in this article:
I have performed all the steps 1-4. There has been an improvement at least in the diagram. The transfer rate looks much more stable.
In addition to NIC teaming I am also running dual vNICs. I read about that in this article: http://www.confio.com/vm-resources/vmware-tips/vmware-host-dropped-packets/
In the diagram below you can see a normal fully working transmission to the left and a faulty one to the right.
My next step is to try and increase the ring buffer in Linux. "ethtool -G rx 4096"
I will also try to use the VisualESXtop software to see if I can capture some DRPRX counts.
The problem with those data drops is that the server does manager to empty the the ring buffer of the equipment which leads to loss of data that is overwritten.
The ring buffer only lasts for 2 seconds in the equipment.
Could be a live saver in your case whereas low latency and high throughput is a requirement. You map an entire vmnic to a VM, so make sure your host has at least 2 NICs to keep managing the host itself.