haxxN
Contributor
Contributor

Very poor VM performance, host CPU utilization low. Please advise.

Hello,

First of all this is my first post here. I hope you guys will be able to help me out.

I just started a new position at work and one of my first jobs is to investigate the poor performance we are getting out of our VMs on two ESXi 5.5 hosts.

The hosts have pretty low CPU utilization, but esxtop reports very high %USED for a couple VMs. Any idea on where to start?

Host CPU & RAM usage:

pastedImage_0.png

esxtop results on that host:

pastedImage_1.png

I'm kind of a newbie regarding best practices for VM performance so any help is greatly appreciated.

Thanks!

0 Kudos
13 Replies
larstr
Champion
Champion

haxxn,

Please check your power saving settings that are configured in BIOS. Most servers (except some of the brand new ones) come with a balanced power saving scheme by default which is good for the environment and bad for performance.

Lars

0 Kudos
IRIX201110141
Virtuoso
Virtuoso

- Check the disk latency within esxtop for the VMs and the storage devices

- Remove VM Snapshots

Btw. vSphere 5.5 runs out of support in 10/2018.

Regards,

Joerg

0 Kudos
Dave_the_Wave
Hot Shot
Hot Shot

Can you please give complete hardware details please.

I would export templates out of the VMs in question, and then lab them in another late model host just to see how they fare there.

0 Kudos
haxxN
Contributor
Contributor

BIOS is already in High Performance mode.

Will check disk stats with esxtop when the usage on the servers will be higher, in a couple hours. Would a higher disk latency have that effect on host & VM CPUs?

Hardware info for host:
pastedImage_0.png

Storage is a Dell EqualLogic PS6100E in a RAID5 config going through a PowerConnect 7024.

0 Kudos
larstr
Champion
Champion

haxxN,

Please check the config of the ports on the PowerConnect 7024. I'm not too familiar with Dell switches, but on cisco / hp we normally configure the ports as edge ports in order to avoid stp slowing things down.

Please also monitor the avg disk queue length inside your heavily loaded VMs (perfmon or resource monitor) in order to decide if the disk system is the bottle neck here.

Lars

0 Kudos
Marmotte94
Enthusiast
Enthusiast

Hi,

Do you have high latency on disk view from esxtop ?

If yes, please look at cpu load on your ESXi host.

Thank you.

Please, visit my blog http://www.purplescreen.eu/
0 Kudos
haxxN
Contributor
Contributor

Switch is correctly configured.

I checked the avg disk queue and it seems a bit high:

pastedImage_0.png

0 Kudos
haxxN
Contributor
Contributor

Here is esxtop's disk view, not sure where I should look for latency, please advise:

pastedImage_0.png

0 Kudos
sjesse
Leadership
Leadership

What are you using for storage? This is saying there is a 13ms latency accessing your storage device. I good guide for a reference at looking at these metrics is

ESXTOP - Yellow Bricks

0 Kudos
haxxN
Contributor
Contributor

Storage is on a Dell EqualLogic PS6100e SAN array

0 Kudos
sjesse
Leadership
Leadership

Do you have access? I would log onto that and look at there admin interface, and look at the volumes there and see if one is more active than another one. If you have multiple data stores that connect, each one should have an associated volume, then try moving workloads to event out the usuage. If one workload seems higher then the rest consider using qos or you may need a stronger array. 13ms isn't terrible and the reference I gave and other say 20-30 is more of a problem.

0 Kudos
IRIX201110141
Virtuoso
Virtuoso

The EqualLogic PS6100"e" is a unit with SATA/NLSAS hdd with 7.2k rpms. For setting up the EQL and the swSCSI on ESXi you have to tweak some adv. parameters[1] by reading the instruction manuals[2]. The same is true for the PowerConnect switch.

*Deleayed, Ack, NoopTimeout, Logintimeout, for the swISCSI

- One vSwitch with 2 VMKs (start without JumboFrames!)

- RSTP, FlowControl, Jumboframes in Switch

We also have PC70xx switch gear together with EQL and yes they are not the fastest on earth. But customers have up to 40 VM spread on 3 dell servers. The SANHQ can help to find out if there are TCP Retransmit or showing the general performance. But that doesnt help that much if you have no historical data or when canot compare to "good" environment.

[2] Rapid EqualLogic Configuration portal | Dell Deutschland

Regards,

Joerg

0 Kudos
haxxN
Contributor
Contributor

Thanks to everyone who helped.

I'm fairly certain at this point that it is indeed a storage issue and not a CPU issue. Still investigating but the fact we're using a RAID 6 configuration seems to hurt our storage performance a lot.

0 Kudos