We are having a very similar issue.
ESXi 6.0u1, VM with Windows Server 2012 R2, SQL Server 2014 SP1 + latest CU, HP BL460 Gen9 blade, latest firmwares everywhere.
When we run a simple query from this VM to a server (physical) that is in the same blade enclosure, running Windows Server 2012 and Oracle 12, with the Oracle 12 x64 client on our VM, the query takes 10 minutes to complete instead of 45 seconds. Simple SELECT query, nothing fancy. Using E1000 instead of VMXNET3, the problem is gone. But also, if we install Wireshark, therefore WinPCap, we can't reproduce the problem. As soon as WinPCap is installed, the problem isn't there. We uninstall it, reboot, and the problem is back.
Trying to run from the same problematic server the same query but against another VM with Oracle we spawned just for testing instead of the physical Oracle server, the problem isn't there.
If we take the problematic SQL server and migrate it to anothere ESXi cluster, but using HP BL460 Gen8 servers, the problem isn't there.
No solution yet, except the E1000 workaround.
We are having very similar issues with Windows Server 2012 R2 guests, using VMXNET3 Nics, but only after upgrading them to VM Hardware Version 11. We have currently switched all our affected vms to the E1000e NIC type, and the issue seems to be temporarily resolved.
We have noticed the issue mainly on our heavier used SQL and IIS machines.
Have you report this to VMware?
I'm currently working with VMware support on the problem described in the post of my colleague, pcarignan.
As of today, we have found the following workarounds :
- Use E1000E instead of vmxnext3
- Use jumbo frames (not really good for us but may be a good option for others...)
- Disable LRO on a VMXNET3 : http://pubs.vmware.com/vsphere-60/index.jsp?topic=%2Fcom.vmware.vsphere.networking.doc%2FGUID-ECC80415-442C-44E9-BA7A-852DDB174B9F.html
I've also found that when WireShark and winpcap are installed on the impacted VM, our slow SQL query problem is not showing.
We also used iperf to measure the bandwith between the impacted servers and after that, our problem was gone... Until the VM was rebooted...
I hope these informations can help you guys in some way.
I'll let you know if we find another workaround.
I just spoke with VMware support and and I got this answer : "VMware is aware of this issue, and there is an active PR for this issue and we are working on a more permanent fix. Note that this issue should only effect, hardware version 11 and windows 2012 in that combo."
For us, disabling LRO on vmxnet3 will be the workaround while we wait for the fix.
Just to add to this - I posted a similar thread (not having seen this one) and someone pointed me here.
We are also running vSphere 6 with Windows Server 2012 R2. We spent several days looking for answers, and simply disabling LRO on the NIC in the OS fixed our problem.
Thank you all very much for your input - it's been of great help. I'm a little disappointed that VMware haven't put this in a KB article - they are normally very good at this.
Does anyone have any idea of fix dates? I guess the fix will be in the form of an ESX patch, at which point we will want to re-enable LRO on the NIC.
Yep, we're also seeing this exact same issue, Windows 2012R2 and SQL 2014 using VMXNET3 in a vSphere 6 environment. Hardware version 11 VMs.
We changed the nics to E1000E on both the SQL and application servers and the issue has now been resolved.
VMware really need to fix this asap, we have wasted a lot of time trying to resolve this.
Thanks that ticket !
As the issue been solved ?
I'm actually in a case where I'm encountering latency on my SLQ Database since I moved the machine from vCenter 5.5 to 6 and upgrade the VM Version from 8 to 11.
I didn't yet change the network adapter from VMXNET3 to E1000e but I will try soon
I did the change and it is working perfectly, as it should. So, using E1000e network interface seems to be the workaroud
We had the exact same issue on our SQL-servers after upgrading to Vsphere6.This is due to the Receive Segment Coalescing feature in newer VMXNET3-nics on Win 2012.
The easiest way to solve this is to turn it off completely in the server os (in admin command prompt):
netsh int tcp set global rsc=disabled
netsh int tcp show global
This is quick and easy with no downtime or interruptions, and does not involve changing nic types.
After doing this on all our SQL and application servers, everything was ok.
I totally agree with HeineD. Like I said in my Nov 25, 2015 post, disabling RSC in Windows is the best workaround. With this approach, you can still use vmxnet3 instead of E1000.
For your in information, there is an official VMware KB about this issue : http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2129176
We've deployed an automated Powershell script on our impacted SQL servers using the commands described in the KB. Like this, we can apply the workaround without any server interruption.
Thanks for this ultimate workaround! This helped out.
I just wondering, there is any side effect by disabling the rsc?