Forgive me - I'm a n00b. Not much experience with VMware, but we're having a dickens of a problem and I'm looking for suggestions.
One VM guest is having major virtual disk read/write latency. It's spiking beyond 9000ms! Average is well over 500ms. This all just started on February 25th. We are not aware of any changes made to the system. This is a Cognos server which builds data cubes nightly. Normally it takes 2 hours to build the cube but over the last few days, that time has doubled and almost tripled last night. I've attached a couple of screen grabs from the perfomance graphs in vCenter. No other guests are having problems on any of our hosts.
We relocated the guest to a fresh, empty datastore. The latency seemed to improve, but only just a little (as in maybe 10% improvement). This amount of latency is orders of magnitude higher than we should see, isn't it? Any suggestions or ideas on what we can do? From the OS perspective, everything looks fine and dandy. No write delayed failed errors, no nothing at all. Operating the GUI feels sluggish, the latency is defintely felt there.
Any suggestions are GREATLY appreciated. Thank you, thank you, thank you.
It seems like it's just the one vdisk that has the high latency and if this is a Windows guest, it's probably the C: drive since it's scsi 0:0. Within the guest, check where your swap file and if you started swapping. Your fix may be as simple as giving the guest more memory.
Hi, thanks for replying. The pagefile is on C:, and is configured to be 1.5x the memory. This is pretty much our standard configuration for all our VM guests, and we don't seem to be experiencing problems with them. Paging on this guest is noticable but not excessive (about 2gb out of 6gb max). Also, take a look at scsi 0:1 - it is actual experiencing high latency but not nearly as bad as the C: drive.
Relocating the guest did nothing to improve the speed of the cube build last night.
We are attempting to migrate it to a new host to see if that makes any difference.
mcowger - I'm not exactly sure how to find that. I don't have direct host access.
UPDATE - migrating the guest to a completely new host in a different cluster has brought the disk latency back into normal ranges. I attached a new screen grab. Perhaps we're seeing a communication bottleneck on that fibre channel. We're having our storage engineer look into it.
I'll post more updates as we discover the root cause. Or if we start having the problem again!
Thanks for everyone's help,
UPDATE: We discovered we had a bad SFP in the SAN switch. We replaced it and the latency is down to <2ms. We've migrated several guests back to it and everything is running fine.
Thanks again to those who commented!