Im running vcenter 5.5 1476327 and about 35 hosts all running esxi 5.5 1746018. In having about ten or 12 vm's freezing daily. The programs continue to run but Im unable to rdp or when i open the vm console the vm is frozen. This is happening acrross multiple clusters\hosts. I've tried migratng the machine to see if that would help. I've taken a cluster down and installed the lastest bios\firmware\nic drivers\storage array drivers and the latest vm updates. I was hopng someone else had run into the problem and found a solution.
Thanks for any and all help.
Are these Windows 2012 VMs? There is a known issues with WIndows 2012 and e1000 vnics that can cause network issues, but I have not seen causing the entire VM to freeze. If the entire VM is feezing, then would suggest some sort storage issue. Are all these VMs on the same lun? Have you checked for storage latency at the hypervisor level?
No...It seems to be happening to all windows versions. They are not all on
the same luns but Im running pretty lean on the shared storage. I have
roughly 20 2 tb datastores running about 15 vm's with about 50 to 100gb 's
of free space in the the datastore. Do you think this may be part of the
problem?
How do I check storage latency at the hypervisor level?
The amount of free space you have should not impact performance. However, if VMs are completely freezing, I suspect you have storage issue, and I would rule this out first. Here are blogs and KB Article to help with storage latency troubleshooting:
Troubleshooting Storage Performance in vSphere – Part 2 | VMware vSphere Blog - VMware Blogs
ESXTOP
VMware KB: Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)
let me know what you find.
ok... I justed check everything out through esxtop and I don't seem to be
having any storage issue's.
The gavg is around 6ms and kavg is about 0.66.
I did notice that im having some cpu wait time issues on some machines. I
suspect some overcommitting. Do you think this could be an issue?
That depends on how high your CPU ready time is, what is average ms you are currently seeing?
Here is what I have.
HOST is a dl585 g7 with 512gb of ram and 4 - 12 core processors.
I have 119 vm's, 168 vcpu's and about half the vm's are running around 2 or
3 and the other half are running anywhere from 20-50ms.
A better metric would be to check VMwait and Ready Time