We have a problem with our server service at one of our customers' site. The customer is using Windows 2008 R2/64 and VMWARE ESX 4.5.
He has 80 VMs running, as he says, on 6 hosts.
Our service is a server which accepts connections on TCP/IP sockets. The customer says that the more connections there are, the slower the server gets.
We wrote a benchmark and indeed, the response time scales with the number of connections. Between one connection and, say, 30 connections,
the application startup time (the time when all initial traffic between the client and the server is done) goes from 10 to 23 seconds.
We rebuilt the situation in house and absolutely cannot see any relation between number of connected clients and server resonse time.
So my question: Is there some strategy in the VMhost that can be adjusted, to give the VM more resources or more CPU or whatever?
The service uses select(), for what it's worth.
--
Christoph
Are you looking for more of a dynamic solution, i.e. when the connects get past a certain number it adds resources accordingly?
Well, he's definitly not running ESX 4.5 (because that doens't exist)
Select() is a pretty ineffecient way to listen to many sockets, which is probably why you are seeing this, and its extremely CPU inefficient.
You could potentially add another vCPU to the VM (if the host isn't overloaded and the application uses threads to listen on these sockets) and it would help some, but the right way to fix this is to fix the app. If you can do that, then just throw more hardware at it (given the caveats above).
Well, he's definitly not running ESX 4.5 (because that doens't exist)
Ha! Aside from that, which I thought was understood, I agree with mcowger about the fixing of the app first. If that is not in the cards then you would need to look at the hardware additions in order to scale as needed.
James,
thanks for answering. I'm looking for a constant solution, not dynamically, but giving it enough resources from start on.
BTW, select() maybe questionable or not. That application doesn't seem to use much CPU. The server is 15 years old
based on this technique and I don't see a reason to do a redesign now, just because it seems to run slow in just one configuration.
( see this for example:)
I tried to build the same situation in house and it doesn't show this behaviour.
--
Christoph
Regarding the ESX version I may have got the wrong information from my customer or may have misnoted version.
I'm trying to get the exact product name/version.
EDIT: It is VMWARE "ESX VSPHERE 4.00 SERVER" they are using.
--
Christoph
I still have no idea what possibly could be tuned and why the server (guest) gets so slow or unresponsive when users start to connect.
The server (guest) not only acts as the license server but also is the file and database server for the clients in the network.
I would guess that each of the 30-50 clients each are doing couple 50 to hundred of file opens through the host to the guest.
Is there a way to monitor the host/guest as to what resources are being used? Is there a tool for this?
--
Christoph