Ok so it sounds like you are maintaining the %RDY is probably related to multiple vCPUs despite %CSTP remaining at 0?
Yeah, I guess it's worth a shot. I don't think we ever asked what the historical performance stats were like for the vm and what you see within the guest when users are complaining about the performance like if the processor usage is high or memory usage is peaking, etc...
That is exactly the problem. The guest doesn't report any performance bottlenecks (via perfmon) when the client complains of performance issues.
Well, I suppose it could be that the host is just CPU constrained, regardless of how many vcpu's this guest has, which would indicate the high RDY but the low CSTP. For that conclusion to be reinforced, though, I would suspect that you would see the pcpu's at a fairly high utilization.
The hard part of this performance game is that you can't always be sure it's an ESX limitation. Just today a coworker was telling me about a client that was complaining about web serving performance. A comparable physical system delivered web content 4x quicker than the virtual, until my coworker tried using IP URL instead of hostname. 90 min. of troubleshooting and looking at performance stats only to find a DNS issue!
So, in that vein, I would suggest trying 2 vcpu or lighten up the load on the host, just to rule that out and if it doesn't help, then what else can you tell us about the server and how the users access it?
The pcpu's are right around 40-50% each, would you consider that constrained? As you said, I am going to go ahead and move the VM to a less utilized host and if the %RDY remains high, I will decrease the amount of vcpus. If you think of anything else, let me know!
I did have lengthy meeting with the group experiencing the performance issues and it turns out they have their application installed in a non-standard config in which multiple roles are installed on the same server when they are supposed to be split out. What we are going to do for now is provide them with enough virtual machines (staying mindful of CPU allocation ) so they can properly divide the workloads. Once we have reconfigured the environment we will revisit any issues which arise. Due to the lack of metrics to support resource constraint on the guest, I am hopeful the problem lies withint he applications themselves. Unfortunately, HP refuses to provide support due to their non-standard configuration.
Ok, I just moved the guest to another host in the same cluster and now it isn't even showing up in ESXTOP so I can't see what the %RDY is. Is there any other way to grab this info?
are you sure DRS didn't move it to another host? does vm-support -x list the vm?
Yes I am sure.
I wanted to revisit this question because I have another guest with similar symptoms, however, it is our patching server so I can reproduce the "performance" issues.
Here are the details:
Application confirmed to be multi-threaded
Originally running with 2 vCPUs and the Proc Time (according to OS) was 100% and Processor Queue Length of approx. 35 when we were running the workload
Increased to 4 vCPU, Proc Time (according to OS) reduced by 30%, however, queue length remained the same and the "sluggish" performance perisisted. At this time, I checked esxtop. When it was running on 1 of the hosts, the %RDY peaked at 175%. While I was looking at it, DRS moved it to another host and since then, it hasn't showed up in esxtop with regard to any metric (network, memory, cpu, disk)
I guess my point here is, tracking performance problems is fairly difficult and %RDY although has been described as something which needs to be below 30, doesn't seem to indicate a problem when the slowness occurs whether or not %RDY is high.
Are there other suggestions? What concerns me the most is the Queue lenght as reported by the OS.
Is this resolved. I am facing something simile with Vm running SAS. The bare metal installation gives nice performance but same configuration VM have seconds difference performance issues. This is a Linux 7. and CPU is the bottleneck. the I/O are all good. But SAS operation simply slow down, when i decrease the CPU from 32 to 8 it's getting better.