Solved: vmxnet3 adapter on windows server 2012 with MSSQL ...

vadonka · ‎11-12-2015

Hi!

We have a windows server 2012 R2 with MSSQL server 2014.

The client software is using jdbc connection to the database. If the clients perform a query what is collecting massive amount of data the whole thing is slowed down after a while.

Then its need to wait 10-20 min to speed up again. I tested everything and tried everything,

- measured the ethernet endpoints

- replaced the cables with cat7

- installed a virtual machine with windows client and the real client use the virtual enviroment via rdp

- enabled the RSS in the vmxnet3 advanced settings and in the windows

- enabled SplitRx mode in the vcenter vm settings and disabled ethernet coalescing too

- set the vm to high latency

- tried to migrate the vm to another host

Non of them help. If a massive sql query is running first its running great, then its need a second or more to check one record.

checked the vm cpu stress but its under 5% all the time. Checked the network with wireshark, there is many small packages but nothing big. The used bandwidth is remain low.

When its slowed down the delay after two packages will increase from few ms to 400-500ms.

Our strongest server is a HP DL380 G9 with two octa core Xeon cpu and 384GB ram, full SSD datastore. Its a lightning fast host.

The vm have 8 cpu core and 64GB ram.

No matter where am i place the vm or how much resources give the issue wont gone.

The strange thing is when the clients slowed down i can copy big files between the client and the server with fast speed, so only the sql slowed down.

I can resolved this issue if i use the E1000 ethernet adapter instead of the VMXNET3, but E1000 have much more overhead, and the server need to transfer bigger files too not just serve the sql.

VMXNET3 much more faster and have lower overhead but with this sql issue i have a problem with that. So i revert to the E1000 for now. Its slower eat more cpu but the sql connections working fine without any problem.

Anybody can point me what can i do more to identify the vmxnet3 issue? I really need that for better network performance and lower overhead, but the fast sql is the top priority.

Thanks! !

HeineD · ‎02-11-2016

We had the exact same issue on our SQL-servers after upgrading to Vsphere6.This is due to the Receive Segment Coalescing feature in newer VMXNET3-nics on Win 2012.

More here: http://nickmarshall.com.au/blog/2015/7/24/vsphere-6-hardware-version-11

The easiest way to solve this is to turn it off completely in the server os (in admin command prompt):

netsh int tcp set global rsc=disabled

Check with:

netsh int tcp show global

This is quick and easy with no downtime or interruptions, and does not involve changing nic types.

After doing this on all our SQL and application servers, everything was ok.

View solution in original post

PCarignan · ‎11-19-2015

We are having a very similar issue.

ESXi 6.0u1, VM with Windows Server 2012 R2, SQL Server 2014 SP1 + latest CU, HP BL460 Gen9 blade, latest firmwares everywhere.

When we run a simple query from this VM to a server (physical) that is in the same blade enclosure, running Windows Server 2012 and Oracle 12, with the Oracle 12 x64 client on our VM, the query takes 10 minutes to complete instead of 45 seconds. Simple SELECT query, nothing fancy. Using E1000 instead of VMXNET3, the problem is gone. But also, if we install Wireshark, therefore WinPCap, we can't reproduce the problem. As soon as WinPCap is installed, the problem isn't there. We uninstall it, reboot, and the problem is back.

Trying to run from the same problematic server the same query but against another VM with Oracle we spawned just for testing instead of the physical Oracle server, the problem isn't there.

If we take the problematic SQL server and migrate it to anothere ESXi cluster, but using HP BL460 Gen8 servers, the problem isn't there.

No solution yet, except the E1000 workaround.

fhtcadmin · ‎11-24-2015

We are having very similar issues with Windows Server 2012 R2 guests, using VMXNET3 Nics, but only after upgrading them to VM Hardware Version 11. We have currently switched all our affected vms to the E1000e NIC type, and the issue seems to be temporarily resolved.

We have noticed the issue mainly on our heavier used SQL and IIS machines.

lhgdv13 · ‎11-25-2015

Hi all

Have you report this to VMware?

hydrombegin · ‎11-25-2015

I'm currently working with VMware support on the problem described in the post of my colleague, pcarignan.

As of today, we have found the following workarounds :

Use E1000E instead of vmxnext3
Use jumbo frames (not really good for us but may be a good option for others...)
Disable LRO on a VMXNET3 : http://pubs.vmware.com/vsphere-60/index.jsp?topic=%2Fcom.vmware.vsphere.networking.doc%2FGUID-ECC804...

I've also found that when WireShark and winpcap are installed on the impacted VM, our slow SQL query problem is not showing.

We also used iperf to measure the bandwith between the impacted servers and after that, our problem was gone... Until the VM was rebooted...

I hope these informations can help you guys in some way.

I'll let you know if we find another workaround.

[UPDATE]

I just spoke with VMware support and and I got this answer : "VMware is aware of this issue, and there is an active PR for this issue and we are working on a more permanent fix. Note that this issue should only effect, hardware version 11 and windows 2012 in that combo."

For us, disabling LRO on vmxnet3 will be the workaround while we wait for the fix.

ianc1990 · ‎11-27-2015

Just to add to this - I posted a similar thread (not having seen this one) and someone pointed me here.

We are also running vSphere 6 with Windows Server 2012 R2. We spent several days looking for answers, and simply disabling LRO on the NIC in the OS fixed our problem.

Thank you all very much for your input - it's been of great help. I'm a little disappointed that VMware haven't put this in a KB article - they are normally very good at this.

Does anyone have any idea of fix dates? I guess the fix will be in the form of an ESX patch, at which point we will want to re-enable LRO on the NIC.

Thanks
Ian

Ink_Global · ‎01-05-2016

Yep, we're also seeing this exact same issue, Windows 2012R2 and SQL 2014 using VMXNET3 in a vSphere 6 environment. Hardware version 11 VMs.

We changed the nics to E1000E on both the SQL and application servers and the issue has now been resolved.

VMware really need to fix this asap, we have wasted a lot of time trying to resolve this.

P4thos · ‎02-10-2016

Hello,

Thanks that ticket !

As the issue been solved ?

I'm actually in a case where I'm encountering latency on my SLQ Database since I moved the machine from vCenter 5.5 to 6 and upgrade the VM Version from 8 to 11.

I didn't yet change the network adapter from VMXNET3 to E1000e but I will try soon

[UPDATE]

I did the change and it is working perfectly, as it should. So, using E1000e network interface seems to be the workaroud

[UPDATE]

HeineD · ‎02-11-2016

We had the exact same issue on our SQL-servers after upgrading to Vsphere6.This is due to the Receive Segment Coalescing feature in newer VMXNET3-nics on Win 2012.

More here: http://nickmarshall.com.au/blog/2015/7/24/vsphere-6-hardware-version-11

The easiest way to solve this is to turn it off completely in the server os (in admin command prompt):

netsh int tcp set global rsc=disabled

Check with:

netsh int tcp show global

This is quick and easy with no downtime or interruptions, and does not involve changing nic types.

After doing this on all our SQL and application servers, everything was ok.

hydrombegin · ‎02-11-2016

I totally agree with HeineD. Like I said in my Nov 25, 2015 post, disabling RSC in Windows is the best workaround. With this approach, you can still use vmxnet3 instead of E1000.

For your in information, there is an official VMware KB about this issue : http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=212917...

We've deployed an automated Powershell script on our impacted SQL servers using the commands described in the KB. Like this, we can apply the workaround without any server interruption.

vadonka · ‎07-03-2016

Thanks for this ultimate workaround! This helped out.

I just wondering, there is any side effect by disabling the rsc?

nuampat · ‎05-21-2019

I tried this solution however it did not resolve the issue. My company is using Sage300 and once we select VMXNET3 adapter for the VMs which has SQL on it, it slows down all processes on Sage300. E1000e seems to work without any issues. Anyone with a better solution?

All

vmxnet3 adapter on windows server 2012 with MSSQL server bottleneck problem