VMware Cloud Community
branfarm1
Contributor
Contributor

Diagnosing poor VM performance

Hi there,

I have a 3 host ESX cluster with the following stats:

2 hosts with 8 x 2.6GHz procs w/16GB RAM

1 host with 8 x 2.6GHz procs w/12GB RAM

These hosts are connected to an HP MSA1500CS with 1 shelf of disks, via a Brocade Fiber switch. I have 2 LUN's, 1 with 8-146GB drives, RAID 5, 957GB, and 1 with 6 - 300GB drives, RAID 5, 1.36 TB. I have 27 VM's and my VM's are evenly distributed across both LUNs.

Almost all of the VM's are running Windows, with a few Linux VM's sprinkled in. The VM's are running apps that tend to be more CPU/Memory heavy rather than disk.

My problem is that the VM's are just dog slow and performance seems to be affected by every little thing that happens on other servers. For instance, if I try to deploy a machine from a template it takes over an hour (30 GB Windows 2003 template), and during that time other servers bog down heavily. Also, anytime one VM is doing anything heavy it tends to spill over into other VM's and hurts their performance.

I've read that 10 VM's is the recommended max per LUN so I know I'm exceeding this recommendation. Without having a place to put my VM's though, I'm stuck with the two LUN's. Can anybody recommend anything I can do to help achieve better performance? Any thoughts on how I can figure out where the true bottleneck is?

Thanks,

--Brandon

0 Kudos
10 Replies
weinstein5
Immortal
Immortal

How many HBAs do you have in your hosts? How many CPUs are you assigning to your VMs?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
branfarm1
Contributor
Contributor

Hi there -- thanks for the response.

Each host has 1 HBA. I typically assign 1 or 2 CPU's because I know that the host has to have as many CPU's available as you have assigned before it will service that VM.

I also wanted to note that I'm running ESX 3.5.0-64607 on each host.

0 Kudos
weinstein5
Immortal
Immortal

What are the applications the VMs are running? One recommendation I would drop all the VMs back to 1 virtual cpu -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
branfarm1
Contributor
Contributor

We run a lot of custom written software -- we are a software development company. I can definitely try moving back to 1 CPU on all the VM's. Question though... I know the developers will feel like I have handicapping them by removing CPU's -- will changing a 2 CPU VM into a 1 CPU VM change any abilities to test threading and multiprocessor support? I'm no developer, so I don't understand how that all works...

0 Kudos
weinstein5
Immortal
Immortal

Yes dropping the vcpu count to one will prevent them from testing multithreading in their applications - but I am trying to eliminate vcpu scheduling as a possible cause - also how much memory is assigned to each of the vms? do you have memory limits set? are these applications doing a lot of disk i/o?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
branfarm1
Contributor
Contributor

Most VM's have 1 GB assigned. I do have a couple that have 2GB and 1 that has 4GB because it runs a memory heavy Java application. I don't have memory limits set at this point. When I look at my hosts though, the memory utilization isn't past 60-70% on any of them. There isn't much disk i/o from the VM's - the applications are mostly for processing data. The disk i/o that does occur is mainly logging.

0 Kudos
_David
Enthusiast
Enthusiast

Do you have cpu affinity set on any vms? Can you look at the CPU Ready value of your vms in the performance chart. The CPU ready value shows how long a vm have to wait to get cpu time. A high value here shows that there is competition of your resources. Also check if you have reserved cpu on your vms or your resourcepools. CPU reservations lock up your resources. Install vmware tools on all your vms.

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points
0 Kudos
Ken_Cline
Champion
Champion

There are some really good documents here that might help you. I'd suggest starting with Performance Monitoring and Analysis.

Ken Cline

Technical Director, Virtualization

Wells Landers

TVAR Solutions, A Wells Landers Group Company

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
Alp1
Enthusiast
Enthusiast

I would agree with Ken.

Take the time to analyze the perfomance monitors in ESXTop or the VIC, looking for high consumers & potential bottlenecks. You may also want to see how much swapping or ballooning is going on. But I'm not so sure that it is limited to the VM level, since the cluster is having issues with simple deployments as well.

Carlo Alpuerto

Senior Systems Engineer

VKernel Corp

www.vkernel.com

0 Kudos
MattG
Expert
Expert

The best practices recommendations for VM per LUN are that, just recommendations. Everyone's setup and load is different. You need to focus on finding the issue before changing your setup.

Based on the config and VM Template issue I would guess you are having a disk performance issues. Places to look for disk troubles:

  • ON ESX host /var/logs/vmkernel.x log files. If you have too many VMs on a single LUN you will see SCSI Reservation errors.

  • Perfmon from inside of VMs. When you are having disk issues, you wouldn't necessarily have CPU/Mem spikes related to it. Monitor Perfmon Average Disk Sec/Read & Average Disk Sec/Write objects. They should below 50ms at all times.

  • Windows Event Logs. You may see disk timeout events.

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos