VMware Cloud Community
boom
Contributor
Contributor

Poor network performance

Hello,

We have virtualized some Windows domain controllers on ESX 2.5.3.

Now, sometimes clients have long logon delays.

The packet analysis has revealed that virtual servers are available but respond with very small packets of 60 bytes or so (see traces below).

Did anyone have this problem or idea what can be wrong?

*********************

16:32:42.805730 IP server.aaa.com.445 > client.aaa.com.1053: P 9116:9182(66) ack 8695 win 63547

16:32:42.805730 IP client.aaa.com.1053 > server.aaa.com.445: P 8695:8758(63) ack 9182 win 63452

16:32:42.821355 IP server.aaa.com.445 > client.aaa.com.1053: P 9182:9248(66) ack 8758 win 63484

16:32:42.821355 IP client.aaa.com.1053 > server.aaa.com.445: P 8758:8821(63) ack 9248 win 63386

16:32:42.821355 IP server.aaa.com.445 > client.aaa.com.1053: P 9248:9314(66) ack 8821 win 63421

16:32:42.821355 IP client.aaa.com.1053 > server.aaa.com.445: P 8821:8884(63) ack 9314 win 63320

16:32:42.836980 IP server.aaa.com.445 > client.aaa.com.1053: P 9314:9380(66) ack 8884 win 63358

16:32:42.836980 IP client.aaa.com.1053 > server.aaa.com.445: P 8884:8947(63) ack 9380 win 63254

16:32:42.852605 IP server.aaa.com.445 > client.aaa.com.1053: P 9380:9446(66) ack 8947 win 63295

16:32:42.852605 IP client.aaa.com.1053 > server.aaa.com.445: P 8947:9010(63) ack 9446 win 63188

16:32:42.852605 IP server.aaa.com.445 > client.aaa.com.1053: P 9446:9512(66) ack 9010 win 63232

16:32:42.852605 IP client.aaa.com.1053 > server.aaa.com.445: P 9010:9073(63) ack 9512 win 63122

16:32:42.868230 IP server.aaa.com.445 > client.aaa.com.1053: P 9512:9578(66) ack 9073 win 63169

16:32:42.868230 IP client.aaa.com.1053 > server.aaa.com.445: P 9073:9136(63) ack 9578 win 63056

16:32:42.868230 IP server.aaa.com.445 > client.aaa.com.1053: P 9578:9644(66) ack 9136 win 63106

16:32:42.868230 IP client.aaa.com.1053 > server.aaa.com.445: P 9136:9199(63) ack 9644 win 64512

16:32:42.883855 IP server.aaa.com.445 > client.aaa.com.1053: P 9644:9710(66) ack 9199 win 63043

16:32:42.883855 IP client.aaa.com.1053 > server.aaa.com.445: P 9199:9262(63) ack 9710 win 64446

16:32:42.883855 IP server.aaa.com.445 > client.aaa.com.1053: P 9710:9776(66) ack 9262 win 62980

16:32:42.883855 IP client.aaa.com.1053 > server.aaa.com.445: P 9262:9325(63) ack 9776 win 64380

16:32:42.899480 IP server.aaa.com.445 > client.aaa.com.1053: P 9776:9842(66) ack 9325 win 62917

16:32:42.899480 IP client.aaa.com.1053 > server.aaa.com.445: P 9325:9388(63) ack 9842 win 64314

0 Kudos
6 Replies
kucharski
Commander
Commander

Since this is VMware 2.x, which VM nic driver are you using? VMXNET or VLANCE? Also, do you have the VMware tools installed on each of the virtual machines? Also, do you have gigabit connections for your virtual machines.

Michael

0 Kudos
boom
Contributor
Contributor

Yes, it is VMXNET and 1Gbit virtual cards.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

What happens if you switch to PCNET32? This is the first step in analysis as the vmxnet driver makes assumptions about the networking that may not be valid in your case. Run the same test using this option and compare the results. If the results are similar then it makes no difference which you use.

Next go to the SC and while you are running your test run esxtop in batch mode to capture the vmnic information. What is the packet and byte transfer speeds? Are you hitting the limits?

Are all these VMs on the same vSwitch?

Are you using load balancing or failover on your vSwitches?

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
boom
Contributor
Contributor

Thank you for the suggestions. We will see what can we do in our conditions...

VMs are on different switches and we are not using load balancing/failover features.

The problem is occuring on production servers and only at periods of a higher load, like every morning when everybody logs on and download their GPO and logon scripts.

We have not been able to reproduce it in our lab nor in production under lower load conditions.

As the problem affects many users at the same time and is not instantly reproduceable, we cannot play too much in production with different options without a clear troubleshooting plan.

At this time we have captured the network traces with small packets.

We have also measured the performance counters inside VMs.

When we increase the load and the problem happens:

CPU peaks to 30 %

Network load is around 5-8 Mbit/s

so they are not too bad.

I would like to know, in principle, if this can happen because of virtualization. Has anyone seen something similar?

0 Kudos
JonT
Enthusiast
Enthusiast

We have a large (15+ pgs.) thread going about network performance:

http://www.vmware.com/community/thread.jspa?threadID=77227&start=0&tstart=75

Most of the troubles that we are looking at are for ESX 3.0.1. Aside from the questions already asked, here are mine:

1. What are the virtual hw specs of your converted Domain Controllers?

2. What HW are you using for your ESX Hosts?

3. (asked before) Have you installed the VMWare Tools on these converted servers?

This almost sounds like your guest machines are having windows type issues, not VMWare.

boom
Contributor
Contributor

After some additionnal analysis of network traffic we concluded that the problem is related to a Windows 2003 bug that does not grant (for whatever reason) Opportunistic File Lock to clients trying to read shared files.

This bug makes clients avoid buffering and forces them to read the file byte per byte(!!!), which results in a h-u-u-u-ge number of tiny packets going back and forth....

Solution: http://support.microsoft.com/kb/319440

Thank you all for your advice

0 Kudos