First of all this is my first post here. I hope you guys will be able to help me out.
I just started a new position at work and one of my first jobs is to investigate the poor performance we are getting out of our VMs on two ESXi 5.5 hosts.
The hosts have pretty low CPU utilization, but esxtop reports very high %USED for a couple VMs. Any idea on where to start?
Host CPU & RAM usage:
esxtop results on that host:
I'm kind of a newbie regarding best practices for VM performance so any help is greatly appreciated.
Please check your power saving settings that are configured in BIOS. Most servers (except some of the brand new ones) come with a balanced power saving scheme by default which is good for the environment and bad for performance.
Can you please give complete hardware details please.
I would export templates out of the VMs in question, and then lab them in another late model host just to see how they fare there.
BIOS is already in High Performance mode.
Will check disk stats with esxtop when the usage on the servers will be higher, in a couple hours. Would a higher disk latency have that effect on host & VM CPUs?
Hardware info for host:
Storage is a Dell EqualLogic PS6100E in a RAID5 config going through a PowerConnect 7024.
Please check the config of the ports on the PowerConnect 7024. I'm not too familiar with Dell switches, but on cisco / hp we normally configure the ports as edge ports in order to avoid stp slowing things down.
Please also monitor the avg disk queue length inside your heavily loaded VMs (perfmon or resource monitor) in order to decide if the disk system is the bottle neck here.
Do you have access? I would log onto that and look at there admin interface, and look at the volumes there and see if one is more active than another one. If you have multiple data stores that connect, each one should have an associated volume, then try moving workloads to event out the usuage. If one workload seems higher then the rest consider using qos or you may need a stronger array. 13ms isn't terrible and the reference I gave and other say 20-30 is more of a problem.
The EqualLogic PS6100"e" is a unit with SATA/NLSAS hdd with 7.2k rpms. For setting up the EQL and the swSCSI on ESXi you have to tweak some adv. parameters by reading the instruction manuals. The same is true for the PowerConnect switch.
*Deleayed, Ack, NoopTimeout, Logintimeout, for the swISCSI
- One vSwitch with 2 VMKs (start without JumboFrames!)
- RSTP, FlowControl, Jumboframes in Switch
We also have PC70xx switch gear together with EQL and yes they are not the fastest on earth. But customers have up to 40 VM spread on 3 dell servers. The SANHQ can help to find out if there are TCP Retransmit or showing the general performance. But that doesnt help that much if you have no historical data or when canot compare to "good" environment.
Thanks to everyone who helped.
I'm fairly certain at this point that it is indeed a storage issue and not a CPU issue. Still investigating but the fact we're using a RAID 6 configuration seems to hurt our storage performance a lot.