I have a fairly large virtual desktop environment hosted accross several ESX 3.5 servers running Windows XP SP2. The majority of the workstations are configured with a single CPU (~3.4Ghz) and 2 Gb of RAM. The desktops are used by developers running BEA Weblogic Workshop. The developers are complaining of not only slow builds but slow performance navigating through the OS (they connect to the workstations using RDP). Builds on a physical machine, I am told, typically take 3-6 minutes (depending on the size of the application) to complete but on the virtual workstations they were taking 10-20 minutes.
Attempting to troubleshoot the performance problems we have doubled the RAM to 4Gb and increased on a couple test machines the CPU count to 2. We have also disabled the Virus Scanner on the workstations. And although the developers are saying that the performance within the operating system has improved, the builds still take considerably longer to complete.
The next thing I am looking into is disk IO. Unfortunately, 99% of the time that I have delt with performance issues within virtual machines it has been either memory or CPU related, so now that I am looking into disk IO, I'm wondering specifically which disk performance counters will be the best to look at to help diagnose any potential disk IO issues. What would be acceptable from a Disk Usage perspective, in terms of KBps.
There doesnt seem to be any memory ballooning. There is a small amount of memory swapping occuring, but only on average of about 80Mb.
I realise this is probably not a straight forward answer and more info will likely be required, but I'm just looking for a place to start at this point.
I haven't ruled out settings within the application yet either. I have the developers looking into that part.
Initially, the virtual machines were hosted locally on the ESX host but in an attempt to alleviate the performance issues and to load balance the virtual machines I did connect the host to the SAN and moved the virtual machines from local storage to a shared LUN on our SAN. I also created a DRS cluster with another identical host for load balancing purposes.
Unfortunately I havent seen much in the way of application performance improvements.
Any help would be appreciated.
There's nothing that we can tell about performance from your description. I'd be willing to guess that you've got a bottleneck in one of your resources. You can probably identify it and remove it by following the on this community.
FYI, I'm currently migrating content to that page from its original location, the performance analysis methods whitepaper . Stick with the wiki as its being updated constantly.