Hi,
I have installed ESX3.5 in Dell 2950 with following configuration.
RAM: 16GB
CPU: 2 Quadcore (2.33) Intel Xeon
Number of VMS: 17
Windows XP VMs: 16 (1vCPU, 512MB Ram)
RHEL5.3: 1 (1vCPU,2GB Ram)
ESX3.5 cpu scheduling algorithm working fine until all vms are
up and running all the time. But in my business requirement, we usually
revert minimum 4 vms(particularly windows vm) every 5 to 10 mins
interval. All vms have some task to do within 5mins which is cpu
intensive (50 - 100%) but not continuously.
The task inside the vm consume cpu resouce very randomly like (
25..4..0..3..23..67.0..0..0....78..98..100..100..100..100..2...23...3....3.22.11),
this number taken from bare machine. (collected from taskManager when
running task)
The ESX scheduler grabs all idle vms cpu whenever it goes below
10 or 3% and increases IDLE world % into 70 to 90. But vms are idle
only few seconds randomly. It then requires cpu heavily. In such
situation, scheduler not allocating enough cpu resources to required
vms eventhough it has idle cpu. So, all jobs are taking double the
amount of time to finish. (instead of completing 5 mins, it is taking
10 to 12 mins)
And moreover, when i revert the vm, the state of vms are
totally changed. It is fresh vm. Normally, esx cpu scheduler try to
allocate vm in the same physical cpu because physical cpu might have
caching of this reverted vm. But in my situation, scheduler no longer
need to maintain cpu affinity to allocate reverted vm in the same
physical cpu.
What I need is,
1. scheduler need to allocate cpu to a vm as soon as possible(without any delay).
2. Reduce accumulating Idle world (%USED) by the scueduler.
3. how do i need to tune cpu settings so that my job need to be finish same 5 mins.
Below is the statistics collected from esxtop, when vms are requesting more cpu,
8:04:24am up 108 days 16:38, 134 worlds; CPU load average: 0.34, 0.59, 0.60
PCPU(%): 86.14, 37.19, 44.54, 36.68, 40.62, 61.22, 65.86, 37.80 ; used total: 51.26
CCPU(%): 11 us, 71 sy, 17 id, 1 wa ; cs/sec: 2656
ID GID NAME NWLD %USED %RUN %SYS %WAIT %RDY %IDLE %OVRLP %CSTP %MLMTD
1 1 idle 8 356.50 361.61 0.00 0.00 404.71 0.00 0.20 0.00 0.00
2 2 system 6 0.22 0.21 0.01 574.13 0.43 0.00 0.00 0.00 0.00
6 6 helper 23 1.11 1.19 0.02 2198.88 3.38 0.00 0.10 0.00 0.00
7 7 drivers 11 0.05 0.05 0.00 1053.90 0.00 0.00 0.00 0.00 0.00
9 9 console 1 82.17 82.09 0.15 10.91 2.82 10.86 0.17 0.00 0.00
15 15 vmware-vmkauthd 1 0.00 0.00 0.00 95.82 0.00 0.00 0.00 0.00 0.00
16 16 Linux (27) 5 6.75 6.80 0.02 463.34 9.00 80.44 0.64 0.00 0.00
289025 289025 Win03 (26) 5 15.33 15.29 0.12 459.51 4.36 50.93 0.25 0.00 0.00
289055 289055 Win15 (43) 5 19.95 20.23 0.06 456.67 2.22 62.22 0.88 0.00 0.00
289056 289056 Win12 (26) 5 78.00 78.08 0.09 397.59 3.46 14.69 0.51 0.00 0.00
289057 289057 Win06 (26) 5 59.91 61.04 0.10 415.30 2.78 24.32 1.36 0.00 0.00
289058 289058 Win07 (26) 5 15.81 15.81 0.14 462.39 1.86 50.80 0.29 0.00 0.00
289059 289059 Win02 (26) 5 10.47 10.57 0.22 465.13 3.44 74.56 1.92 0.00 0.00
289060 289060 Win09 (26) 5 8.85 8.69 0.49 468.55 1.91 70.22 1.03 0.00 0.00
289061 289061 Win11 (26) 5 3.82 3.78 0.18 475.48 0.77 92.23 0.32 0.00 0.00
289062 289062 Win04 (26) 5 3.47 3.37 0.23 475.37 1.32 91.87 0.38 0.00 0.00
289063 289063 Win13 (26) 5 44.90 44.84 0.26 430.53 3.86 26.50 0.28 0.00 0.00
289064 289064 Win08 (26) 5 17.00 17.08 0.23 460.09 2.76 41.30 0.39 0.00 0.00
289065 289065 Win16 (26) 6 2.84 1.82 1.31 572.65 0.53 0.00 0.33 0.00 0.00
289066 289066 Win14 (26) 6 2.67 1.68 1.30 572.84 0.50 0.00 0.39 0.00 0.00
289067 289067 Win10 (26) 5 1.65 11.61 800.00 500.00 800.00 0.00 800.00 0.00 0.00
289068 289068 Win05 (26) 6 3.23 2.69 0.72 571.73 0.58 0.00 0.30 0.00 0.00
289069 289069 Win01 (27) 1 12.18 12.15 0.14 78.64 5.25 0.00 0.14 0.00 0.00
Advance thanks for your response.
Interesting reading, but IMHO its very little you can do to tune the scheduler by yourself.
Probably the best thing for you would be to upgrade to vSphere 4 where the scheduler have been very much improved.
Best regards,
Linjo
If you find this information useful, please award points for "correct" or "helpful".
Hi Linjo,
thanks for your response. But i cannot upgrade it to vSphere 4. Its deployed in serveral prod servers. I hope, there must be some good settings that we can tune the scheduler. All we need to do is just reduce accumulating idle world.
For more information regarding this problem, the following table will help you.
ID | GID | NAME | NWLD | %USED | %RUN | %SYS | %WAIT | %RDY | %IDLE | %OVRLP | %CSTP | %MLMTD |
1816010 | 291703 | vmware-vmx | 1 | 0.07 | 0.07 | 0 | 95.7 | 0.02 | 0 | 0.06 | 0 | 0 |
1816011 | 291703 | vmm0:Win01_ | 1 | 8.41 | 8.36 | 0.25 | 86.74 | 0.69 | 71.65 | 1.05 | 0 | 0 |
1816012 | 291703 | vmware-vmx | 1 | 0 | 0 | 0 | 95.79 | 0 | 0 | 0 | 0 | 0 |
1816013 | 291703 | mks:Win01( | 1 | 0.26 | 0.26 | 0 | 95.1 | 0.43 | 0 | 0.01 | 0 | 0 |
1816014 | 291703 | vcpu-0:Win0 | 1 | 0.05 | 0.05 | 0 | 95.74 | 0.01 | 0 | 0 | 0 | 0 |
So, we can see %WAIT time for particular vm is much high for all services (vmware-vmx,vmm0,mks etc).
Note: VM is not waiting for I/O. It is CPU intensive mostly.
Have you thought about using reservations? You could reserve a certain amount of Mhz for each VM. I've never used CPU reservations so I can't give you solid advice on it, but I would start with a value such as 500 Mhz or maybe 1000 Mhz and go up from there little bit little.
The drawback is that you only have 18.64 Ghz to give out, so you won't be able to reserve much more than 1 Ghz per VM if you want to keep all 17 VMs running.
You should also take a look at the Hyperthreading and Affinity sections of the Resource Management Guide (it was in the "Advanced" chapter) and see if any of that applies to you. There are situations where turning off Hyperthreading (in VMware's host configuration screen, and if you CPUs support it) can improve performance because there would be only one active VM per core at a time. Again, something you could experiment with.
Hope this helps!
Hi,
Thank you PacketRacer. But i have already tried cpu reservation where each vms reserved min 1000MHz. But i didnt see any contention instead i saw low utilization. For example, if i didnt use reservation techniques, the overall cpu throughput was 49%. If i do so, the overall cpu throughput was below 25%. The cpu waiting time was increased enormously. This doesn't work for this scenario.
I also tried scheduling affinity to each vm. For example, every physical cpu(totally 8 cpus) was allocated with 2 vms. Instead of improving throughput, i saw reverse effect. So, i dropped all of my tinker work and let the esx3.5 scheduler to do its best way.
You can see above table, the cpu waiting time is too huge. Do you have any idea, how to reduce it?
I think you may be misinterpreting the esxtop statistics here.
Based on your output, your Win01_ vm is only spending 0.25% (%SYS) for some system services on behalf of this vm and 1.05%(%OVERLP) on behalf of other vm worlds. Most of your CPU time is spent in %WAIT (86.74%) , with 71.65% in IDLE. That leaves 86.74-71.65 = 15.09% spent in a wait for some resources, IO or other.
Sounds like you are running into a contention of some sort. Have you verified what kind of response you are getting from storage, etc?
-KjB
Hi,
The task inside the vm write result on file. is it going to take much i/o? I am not sure about disk i/o number in esxtop. Please use the below tables to identify where is the contension is.
ID | GID | NAME | NWLD | %USED | %RUN | %SYS | %WAIT | %RDY | %IDLE | %OVRLP | %CSTP | %MLMTD |
1 | 1 | idle | 8 | 701.3 | 707.83 | 0 | 0 | 98.22 | 0 | 0.76 | 0 | 0 |
2 | 2 | system | 6 | 0.12 | 0.12 | 0 | 600 | 0.02 | 0 | 0 | 0 | 0 |
6 | 6 | helper | 23 | 0.41 | 0.46 | 0 | 2300 | 0.72 | 0 | 0.05 | 0 | 0 |
7 | 7 | drivers | 11 | 0.06 | 0.06 | 0 | 1100 | 0 | 0 | 0 | 0 | 0 |
9 | 9 | console | 1 | 5.12 | 5.05 | 0.11 | 85.22 | 10.49 | 85.17 | 0.08 | 0 | 0 |
15 | 15 | vmware-vmkauthd | 1 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 | 0 |
16 | 16 | Linux | 5 | 14.82 | 14.88 | 0 | 484.98 | 3.93 | 82.35 | 0.5 | 0 | 0 |
84679 | 84679 | Win16 | 5 | 4.24 | 3.99 | 0.1 | 499.37 | 0.37 | 65.58 | 1.85 | 0 | 0 |
84689 | 84689 | Win12 | 5 | 1.82 | 1.8 | 0.04 | 500 | 0.61 | 92.14 | 0.17 | 0 | 0 |
84693 | 84693 | Win13 | 5 | 2.3 | 2.31 | 0.02 | 500 | 0.15 | 72.84 | 0.3 | 0 | 0 |
84695 | 84695 | Win08 | 5 | 12.34 | 12.36 | 0.03 | 489.98 | 1.41 | 68.5 | 0.15 | 0 | 0 |
84698 | 84698 | Win15 | 5 | 3.42 | 3.42 | 0.03 | 499.81 | 0.55 | 79.66 | 0.15 | 0 | 0 |
84699 | 84699 | Win07 | 5 | 12.22 | 12.25 | 0.04 | 490.87 | 0.66 | 44.95 | 0.56 | 0 | 0 |
84700 | 84700 | Win05 | 5 | 16.48 | 16.24 | 0.05 | 486.39 | 1.1 | 8.69 | 0.1 | 0 | 0 |
84701 | 84701 | Win09 | 5 | 7.05 | 7.05 | 0.04 | 496.07 | 0.61 | 70.51 | 0.49 | 0 | 0 |
84702 | 84702 | Win01 | 5 | 1.85 | 1.84 | 0.03 | 500 | 0.24 | 88.06 | 0.27 | 0 | 0 |
84703 | 84703 | Win10 | 5 | 4.34 | 4.31 | 0.1 | 498.78 | 0.7 | 82.85 | 0.21 | 0 | 0 |
84704 | 84704 | Win11 | 6 | 1.96 | 0.97 | 1.17 | 600 | 0.18 | 0 | 1.45 | 0 | 0 |
84705 | 84705 | Win03 | 6 | 2.53 | 1.54 | 1.21 | 600 | 0.25 | 0 | 0.25 | 0 | 0 |
84706 | 84706 | Win02 | 6 | 2.01 | 1.04 | 1.09 | 600 | 0.17 | 0 | 0.81 | 0 | 0 |
84707 | 84707 | Win04 | 6 | 2.74 | 1.34 | 1.66 | 600 | 0.32 | 0 | 0.48 | 0 | 0 |
84708 | 84708 | Win06 | 1 | 0.64 | 0.62 | 0.03 | 100 | 0.09 | 0 | 0.02 | 0 | 0 |
84709 | 84709 | Win14 | 6 | 5.98 | 5.32 | 0.8 | 343.36 | 0.33 | 0 | 0.18 | 0 | 0 |
DISK I/O ESXTOP:
ID | GID | NAME | DEVICE | NWD | NDV | DQLEN | WQLEN | ACTV | QUED | %USD | LOAD | CMDS/s | READS/s | WRITES/s | MBREAD/s | MBWRTN/s |
2 | 2 | system | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 7.03 | 0 | 7.03 | 0 | 0.02 |
6 | 6 | helper | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 44.57 | 0 | 0 | 0 | 0 |
9 | 9 | console | - | 1 | - | 0 | 0 | 0 | 0 | 0 | 0 | 222.86 | 127.69 | 95.17 | 0.26 | 0.26 |
15 | 15 | vmware-vmkauthd | - | 1 | - | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
16 | 16 | Linux | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 2.41 | 0 | 2.41 | 0 | 0.03 |
84679 | 84679 | Win16 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 1.2 | 0 | 1.2 | 0 | 0.01 |
84689 | 84689 | Win12 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 2.41 | 0.2 | 2.21 | 0 | 0.01 |
84690 | 84690 | Win11 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 35.34 | 15.26 | 20.08 | 0.58 | 0.53 |
84691 | 84691 | Win03 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 30.52 | 20.48 | 10.04 | 1.09 | 0.4 |
84692 | 84692 | Win04 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 125.08 | 27.91 | 96.77 | 0.56 | 1.28 |
84693 | 84693 | Win13 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 42.76 | 25.7 | 17.07 | 0.62 | 0.13 |
84694 | 84694 | Win14 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 46.98 | 28.31 | 18.67 | 0.92 | 0.54 |
84695 | 84695 | Win08 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 131.11 | 91.95 | 39.15 | 1.07 | 0.7 |
84696 | 84696 | Win06 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 34.13 | 16.66 | 17.47 | 0.31 | 0.14 |
84697 | 84697 | Win02 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 47.58 | 37.34 | 10.24 | 2.02 | 0.24 |
84698 | 84698 | Win15 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 42.16 | 13.65 | 28.51 | 0.24 | 0.3 |
84699 | 84699 | Win07 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 45.37 | 29.92 | 15.46 | 0.69 | 0.51 |
84700 | 84700 | Win05 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 131.71 | 92.76 | 38.95 | 1.02 | 0.34 |
84701 | 84701 | Win09 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 191.74 | 152.39 | 39.35 | 1.14 | 0.36 |
84702 | 84702 | Win01 | - | 3 | - | 0 | 0 | 0 | 0 | 0 | 0 | 61.84 | 61.84 | 0 | 8.78 | 0 |
84703 | 84703 | Win10 | - | 1 | - | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
appreciate your response.
I would definitely validate your storage configuration. A quick look shows that the vm's with high %WAIT are also ones that have 100+ IOPS being attempted. Are all of these systems on the same datastore? If so, how large is the datastore, and how many disks back that datastore?
-KjB
Yes, all the vms are in same data store only. My datastore has two hard disks. One is allocated for esx (core) another one is for VMs(VMFS). The size of the disk is 275 GB. Each Windows VM size is 8GB including hard disk space. Linux VM size is 80GB. Around 210GB occupied by vms.
That is definitely the issue at hand. Two disks have between 150 (SATA) to 300 (FC) IOPS, and you are trying to force 1250. You need to add additional spindles, or add some additional datastores to get any further improvements.
-KjB
I am really sorry that i am not getting. How did you come with this number?
Looking at your esxtop otuput, I added up the cmd/s to give me the 1250 number. As far as how many IOPS your datastore can handle, generally speaking a SATA disk can perform 50-80 IOPS (depending on the RPM of the device), and a scsi/fc can do 100-150, depending on the device. You only have two disks in your datastore, so your datastore can handle 100 - 300 IOPS total, before you have to start waiting for I/O.
-KjB
Thanks for your information. My harddisk speed is 10000 RPM/s. Sometime i can see some big number in disk I/O. for example you can see Win07,
249931 249931 Win02 (33) - 3 - 0 0 0 0 0 0.00 0.95 0.19 0.76 0.00 0.00
249932 249932 Win03 (32) - 3 - 0 0 0 0 0 0.00 0.76 0.38 0.38 0.01 0.00
249933 249933 Win04 (32) - 3 - 0 0 0 0 0 0.00 11.25 0.19 11.06 0.00 0.38
249934 249934 Win05 (31) - 3 - 0 0 0 0 0 0.00 9.16 1.91 7.25 0.02 0.07
249935 249935 Win06 (31) - 3 - 0 0 0 0 0 0.00 254.06 19.07 230.03 0.32 53.11
249936 249936 Win07 (29) - 3 - 0 0 0 0 0 0.00 3518437297766402048.00 3518437297766402048.00 0.57 3355443284761.95
249937 249937 Win08 (31) - 3 - 0 0 0 0 0 0.00 12.97 0.00 12.97 0.00 0.39
249938 249938 Win01 (2) - 3 - 0 0 0 0 0 0.00 16.98 16.98 0.00 0.81 0.00
another sample,
249938 249938 Win01 (2) - 3 - 0 0 0 0 0 0.00 3.10 0.00 3.10 0.00 0.02
249940 249940 Win03 (32) - 3 - 0 0 0 0 0 0.00 167.37 148.53 18.84 4.27 0.10
249941 249941 Win06 (31) - 3 - 0 0 0 0 0 0.00 2.62 0.00 2.62 0.00 0.02
249942 249942 Win05 (31) - 3 - 0 0 0 0 0 0.00 33.14 6.91 25.75 4194304010522.71 0.27
But this is coming in random seconds.
You need to answer the question "what are my VMs waiting for" before you can answer the question "how do I reduce CPU wait time." Like KjB said, it's almost certainly storage!
Do this:
1) run esxtop
2) hit the 'u' key to go to the disk device screen
3) grab that output and post it
4) then hit the f key to show new fields
5) turn off B, C and G
6) turn on H; your "add / remove field" screen should look like this:
7) hit enter to go back to stats screen
😎 capture the output again and post it
Try to do this during a busy time, if possible. It's difficult to tell what's going on just by looking at a snapshot.
Hi,
The below snapshot has taken when the system is in busy state.
DEVICE | PATH/WORLD/PARTITION | NPH | NWD | NPN | DQLEN | WQLEN | ACTV | QUED | %USD | LOAD | CMDS/s | READS/s | WRITES/s | MBREAD/s | MBWRTN/s |
vmhba1:0:0 | - | 1 | 4 | 9 | 128 | 0 | 0 | 0 | 0 | 0 | 53.6 | 0 | 53.6 | 0 | 0.72 |
vmhba1:1:0 | - | 1 | 61 | 2 | 32 | 0 | 12 | 0 | 37 | 0.38 | 1119.04 | 742.15 | 321.39 | 32.28 | 2.67 |
vmhba32:0:0 | - | 1 | 4 | 0 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Here is the second table, after disabling B C G and enabling H
DEVICE | DQLEN | WQLEN | ACTV | QUED | %USD | LOAD | DAVG/cmd | KAVG/cmd | GAVG/cmd | QAVG/cmd | DAVG/rd | KAVG/rd | GAVG/rd | QAVG/rd | DAVG/wr | KAVG/wr | GAVG/wr | QAVG/wr |
vmhba1:0:0 | 128 | 0 | 0 | 0 | 0 | 0 | 0.08 | 0.01 | 0.09 | 0 | 0 | 0 | 0 | 0 | 0.08 | 0.01 | 0.09 | 0 |
vmhba1:1:0 | 32 | 0 | 16 | 0 | 50 | 0.5 | 21.4 | 0.06 | 21.47 | 0.01 | 28.35 | 0.08 | 28.43 | 0.02 | 0.17 | 0.02 | 0.19 | 0 |
vmhba32:0:0 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Thanks
From the output you posted:
You are trying to run 1119.04 (CMD/s), your ESX device queue, which for that LUN is 32, and is 50% used, so ESX is not queuing any data, but your device itself is adding 28.43ms of latency for every read (DAVG/rd), and the ESX host itself is adding very little to no latency from the vmkernel itself (KAVG)
So, your vm's are spending a lot of time waiting for read data, which is causing the problems you are seeing. You must address the storage before you will be able to get better performance.
-KjB
Thanks for your info. The task inside the vm is just launching a browser and load an urls on it. And i am not using any disk related activity in my task. Only one file i write into disk which is less than 100KB. Then why esxtop is showing this much huge disk activities? And what it is reading from disk where i dont have any read activities in my task. Can we get any details regarding disk read operation?
ESX can't provide you that data. You need to look inside the vm itself. Maybe antivirus, or something of that nature?
-KjB