Re: How to tune esx3.5 cpu scheduling to fulfill m...

saravanan_ad · ‎05-21-2010

Hi,

I have installed ESX3.5 in Dell 2950 with following configuration.

RAM: 16GB

CPU: 2 Quadcore (2.33) Intel Xeon

Number of VMS: 17

Windows XP VMs: 16 (1vCPU, 512MB Ram)

RHEL5.3: 1 (1vCPU,2GB Ram)

ESX3.5 cpu scheduling algorithm working fine until all vms are

up and running all the time. But in my business requirement, we usually

revert minimum 4 vms(particularly windows vm) every 5 to 10 mins

interval. All vms have some task to do within 5mins which is cpu

intensive (50 - 100%) but not continuously.

The task inside the vm consume cpu resouce very randomly like (

25..4..0..3..23..67.0..0..0....78..98..100..100..100..100..2...23...3....3.22.11),

this number taken from bare machine. (collected from taskManager when

running task)

The ESX scheduler grabs all idle vms cpu whenever it goes below

10 or 3% and increases IDLE world % into 70 to 90. But vms are idle

only few seconds randomly. It then requires cpu heavily. In such

situation, scheduler not allocating enough cpu resources to required

vms eventhough it has idle cpu. So, all jobs are taking double the

amount of time to finish. (instead of completing 5 mins, it is taking

10 to 12 mins)

And moreover, when i revert the vm, the state of vms are

totally changed. It is fresh vm. Normally, esx cpu scheduler try to

allocate vm in the same physical cpu because physical cpu might have

caching of this reverted vm. But in my situation, scheduler no longer

need to maintain cpu affinity to allocate reverted vm in the same

physical cpu.

What I need is,

1. scheduler need to allocate cpu to a vm as soon as possible(without any delay).

2. Reduce accumulating Idle world (%USED) by the scueduler.

3. how do i need to tune cpu settings so that my job need to be finish same 5 mins.

Below is the statistics collected from esxtop, when vms are requesting more cpu,

8:04:24am up 108 days 16:38, 134 worlds; CPU load average: 0.34, 0.59, 0.60

PCPU(%): 86.14, 37.19, 44.54, 36.68, 40.62, 61.22, 65.86, 37.80 ; used total: 51.26

CCPU(%): 11 us, 71 sy, 17 id, 1 wa ; cs/sec: 2656

ID GID NAME NWLD %USED %RUN %SYS %WAIT %RDY %IDLE %OVRLP %CSTP %MLMTD

1 1 idle 8 356.50 361.61 0.00 0.00 404.71 0.00 0.20 0.00 0.00

2 2 system 6 0.22 0.21 0.01 574.13 0.43 0.00 0.00 0.00 0.00

6 6 helper 23 1.11 1.19 0.02 2198.88 3.38 0.00 0.10 0.00 0.00

7 7 drivers 11 0.05 0.05 0.00 1053.90 0.00 0.00 0.00 0.00 0.00

9 9 console 1 82.17 82.09 0.15 10.91 2.82 10.86 0.17 0.00 0.00

15 15 vmware-vmkauthd 1 0.00 0.00 0.00 95.82 0.00 0.00 0.00 0.00 0.00

16 16 Linux (27) 5 6.75 6.80 0.02 463.34 9.00 80.44 0.64 0.00 0.00

289025 289025 Win03 (26) 5 15.33 15.29 0.12 459.51 4.36 50.93 0.25 0.00 0.00

289055 289055 Win15 (43) 5 19.95 20.23 0.06 456.67 2.22 62.22 0.88 0.00 0.00

289056 289056 Win12 (26) 5 78.00 78.08 0.09 397.59 3.46 14.69 0.51 0.00 0.00

289057 289057 Win06 (26) 5 59.91 61.04 0.10 415.30 2.78 24.32 1.36 0.00 0.00

289058 289058 Win07 (26) 5 15.81 15.81 0.14 462.39 1.86 50.80 0.29 0.00 0.00

289059 289059 Win02 (26) 5 10.47 10.57 0.22 465.13 3.44 74.56 1.92 0.00 0.00

289060 289060 Win09 (26) 5 8.85 8.69 0.49 468.55 1.91 70.22 1.03 0.00 0.00

289061 289061 Win11 (26) 5 3.82 3.78 0.18 475.48 0.77 92.23 0.32 0.00 0.00

289062 289062 Win04 (26) 5 3.47 3.37 0.23 475.37 1.32 91.87 0.38 0.00 0.00

289063 289063 Win13 (26) 5 44.90 44.84 0.26 430.53 3.86 26.50 0.28 0.00 0.00

289064 289064 Win08 (26) 5 17.00 17.08 0.23 460.09 2.76 41.30 0.39 0.00 0.00

289065 289065 Win16 (26) 6 2.84 1.82 1.31 572.65 0.53 0.00 0.33 0.00 0.00

289066 289066 Win14 (26) 6 2.67 1.68 1.30 572.84 0.50 0.00 0.39 0.00 0.00

289067 289067 Win10 (26) 5 1.65 11.61 800.00 500.00 800.00 0.00 800.00 0.00 0.00

289068 289068 Win05 (26) 6 3.23 2.69 0.72 571.73 0.58 0.00 0.30 0.00 0.00

289069 289069 Win01 (27) 1 12.18 12.15 0.14 78.64 5.25 0.00 0.14 0.00 0.00

Advance thanks for your response.

Linjo · ‎05-21-2010

Interesting reading, but IMHO its very little you can do to tune the scheduler by yourself.

Probably the best thing for you would be to upgrade to vSphere 4 where the scheduler have been very much improved.

Best regards,

Linjo

If you find this information useful, please award points for "correct" or "helpful".

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".

saravanan_ad · ‎05-21-2010

Hi Linjo,

thanks for your response. But i cannot upgrade it to vSphere 4. Its deployed in serveral prod servers. I hope, there must be some good settings that we can tune the scheduler. All we need to do is just reduce accumulating idle world.

saravanan_ad · ‎05-22-2010

For more information regarding this problem, the following table will help you.

ID	GID	NAME	NWLD	%USED	%RUN	%SYS	%WAIT	%RDY	%IDLE	%OVRLP	%CSTP	%MLMTD
1816010	291703	vmware-vmx	1	0.07	0.07	0	95.7	0.02	0	0.06	0	0
1816011	291703	vmm0:Win01_	1	8.41	8.36	0.25	86.74	0.69	71.65	1.05	0	0
1816012	291703	vmware-vmx	1	0	0	0	95.79	0	0	0	0	0
1816013	291703	mks:Win01(	1	0.26	0.26	0	95.1	0.43	0	0.01	0	0
1816014	291703	vcpu-0:Win0	1	0.05	0.05	0	95.74	0.01	0	0	0	0

So, we can see %WAIT time for particular vm is much high for all services (vmware-vmx,vmm0,mks etc).

Note: VM is not waiting for I/O. It is CPU intensive mostly.

PacketRacer · ‎05-24-2010

Have you thought about using reservations? You could reserve a certain amount of Mhz for each VM. I've never used CPU reservations so I can't give you solid advice on it, but I would start with a value such as 500 Mhz or maybe 1000 Mhz and go up from there little bit little.

The drawback is that you only have 18.64 Ghz to give out, so you won't be able to reserve much more than 1 Ghz per VM if you want to keep all 17 VMs running.

You should also take a look at the Hyperthreading and Affinity sections of the Resource Management Guide (it was in the "Advanced" chapter) and see if any of that applies to you. There are situations where turning off Hyperthreading (in VMware's host configuration screen, and if you CPUs support it) can improve performance because there would be only one active VM per core at a time. Again, something you could experiment with.

Hope this helps!

saravanan_ad · ‎05-24-2010

Hi,

Thank you PacketRacer. But i have already tried cpu reservation where each vms reserved min 1000MHz. But i didnt see any contention instead i saw low utilization. For example, if i didnt use reservation techniques, the overall cpu throughput was 49%. If i do so, the overall cpu throughput was below 25%. The cpu waiting time was increased enormously. This doesn't work for this scenario.

I also tried scheduling affinity to each vm. For example, every physical cpu(totally 8 cpus) was allocated with 2 vms. Instead of improving throughput, i saw reverse effect. So, i dropped all of my tinker work and let the esx3.5 scheduler to do its best way.

You can see above table, the cpu waiting time is too huge. Do you have any idea, how to reduce it?

kjb007 · ‎05-24-2010

I think you may be misinterpreting the esxtop statistics here.

Based on your output, your Win01_ vm is only spending 0.25% (%SYS) for some system services on behalf of this vm and 1.05%(%OVERLP) on behalf of other vm worlds. Most of your CPU time is spent in %WAIT (86.74%) , with 71.65% in IDLE. That leaves 86.74-71.65 = 15.09% spent in a wait for some resources, IO or other.

Sounds like you are running into a contention of some sort. Have you verified what kind of response you are getting from storage, etc?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

saravanan_ad · ‎05-24-2010

Hi,

The task inside the vm write result on file. is it going to take much i/o? I am not sure about disk i/o number in esxtop. Please use the below tables to identify where is the contension is.

ID	GID	NAME	NWLD	%USED	%RUN	%SYS	%WAIT	%RDY	%IDLE	%OVRLP	%CSTP	%MLMTD
1	1	idle	8	701.3	707.83	0	0	98.22	0	0.76	0	0
2	2	system	6	0.12	0.12	0	600	0.02	0	0	0	0
6	6	helper	23	0.41	0.46	0	2300	0.72	0	0.05	0	0
7	7	drivers	11	0.06	0.06	0	1100	0	0	0	0	0
9	9	console	1	5.12	5.05	0.11	85.22	10.49	85.17	0.08	0	0
15	15	vmware-vmkauthd	1	0	0	0	100	0	0	0	0	0
16	16	Linux	5	14.82	14.88	0	484.98	3.93	82.35	0.5	0	0
84679	84679	Win16	5	4.24	3.99	0.1	499.37	0.37	65.58	1.85	0	0
84689	84689	Win12	5	1.82	1.8	0.04	500	0.61	92.14	0.17	0	0
84693	84693	Win13	5	2.3	2.31	0.02	500	0.15	72.84	0.3	0	0
84695	84695	Win08	5	12.34	12.36	0.03	489.98	1.41	68.5	0.15	0	0
84698	84698	Win15	5	3.42	3.42	0.03	499.81	0.55	79.66	0.15	0	0
84699	84699	Win07	5	12.22	12.25	0.04	490.87	0.66	44.95	0.56	0	0
84700	84700	Win05	5	16.48	16.24	0.05	486.39	1.1	8.69	0.1	0	0
84701	84701	Win09	5	7.05	7.05	0.04	496.07	0.61	70.51	0.49	0	0
84702	84702	Win01	5	1.85	1.84	0.03	500	0.24	88.06	0.27	0	0
84703	84703	Win10	5	4.34	4.31	0.1	498.78	0.7	82.85	0.21	0	0
84704	84704	Win11	6	1.96	0.97	1.17	600	0.18	0	1.45	0	0
84705	84705	Win03	6	2.53	1.54	1.21	600	0.25	0	0.25	0	0
84706	84706	Win02	6	2.01	1.04	1.09	600	0.17	0	0.81	0	0
84707	84707	Win04	6	2.74	1.34	1.66	600	0.32	0	0.48	0	0
84708	84708	Win06	1	0.64	0.62	0.03	100	0.09	0	0.02	0	0
84709	84709	Win14	6	5.98	5.32	0.8	343.36	0.33	0	0.18	0	0

DISK I/O ESXTOP:

ID	GID	NAME	DEVICE	NWD	NDV	DQLEN	WQLEN	ACTV	QUED	%USD	LOAD	CMDS/s	READS/s	WRITES/s	MBREAD/s	MBWRTN/s
2	2	system	-	3	-	0	0	0	0	0	0	7.03	0	7.03	0	0.02
6	6	helper	-	3	-	0	0	0	0	0	0	44.57	0	0	0	0
9	9	console	-	1	-	0	0	0	0	0	0	222.86	127.69	95.17	0.26	0.26
15	15	vmware-vmkauthd	-	1	-	0	0	0	0	0	0	0	0	0	0	0
16	16	Linux	-	3	-	0	0	0	0	0	0	2.41	0	2.41	0	0.03
84679	84679	Win16	-	3	-	0	0	0	0	0	0	1.2	0	1.2	0	0.01
84689	84689	Win12	-	3	-	0	0	0	0	0	0	2.41	0.2	2.21	0	0.01
84690	84690	Win11	-	3	-	0	0	0	0	0	0	35.34	15.26	20.08	0.58	0.53
84691	84691	Win03	-	3	-	0	0	0	0	0	0	30.52	20.48	10.04	1.09	0.4
84692	84692	Win04	-	3	-	0	0	0	0	0	0	125.08	27.91	96.77	0.56	1.28
84693	84693	Win13	-	3	-	0	0	0	0	0	0	42.76	25.7	17.07	0.62	0.13
84694	84694	Win14	-	3	-	0	0	0	0	0	0	46.98	28.31	18.67	0.92	0.54
84695	84695	Win08	-	3	-	0	0	0	0	0	0	131.11	91.95	39.15	1.07	0.7
84696	84696	Win06	-	3	-	0	0	0	0	0	0	34.13	16.66	17.47	0.31	0.14
84697	84697	Win02	-	3	-	0	0	0	0	0	0	47.58	37.34	10.24	2.02	0.24
84698	84698	Win15	-	3	-	0	0	0	0	0	0	42.16	13.65	28.51	0.24	0.3
84699	84699	Win07	-	3	-	0	0	0	0	0	0	45.37	29.92	15.46	0.69	0.51
84700	84700	Win05	-	3	-	0	0	0	0	0	0	131.71	92.76	38.95	1.02	0.34
84701	84701	Win09	-	3	-	0	0	0	0	0	0	191.74	152.39	39.35	1.14	0.36
84702	84702	Win01	-	3	-	0	0	0	0	0	0	61.84	61.84	0	8.78	0
84703	84703	Win10	-	1	-	0	0	0	0	0	0	0	0	0	0	0

appreciate your response.

kjb007 · ‎05-25-2010

I would definitely validate your storage configuration. A quick look shows that the vm's with high %WAIT are also ones that have 100+ IOPS being attempted. Are all of these systems on the same datastore? If so, how large is the datastore, and how many disks back that datastore?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

saravanan_ad · ‎05-25-2010

Yes, all the vms are in same data store only. My datastore has two hard disks. One is allocated for esx (core) another one is for VMs(VMFS). The size of the disk is 275 GB. Each Windows VM size is 8GB including hard disk space. Linux VM size is 80GB. Around 210GB occupied by vms.

kjb007 · ‎05-25-2010

That is definitely the issue at hand. Two disks have between 150 (SATA) to 300 (FC) IOPS, and you are trying to force 1250. You need to add additional spindles, or add some additional datastores to get any further improvements.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

saravanan_ad · ‎05-25-2010

I am really sorry that i am not getting. How did you come with this number?

kjb007 · ‎05-25-2010

Looking at your esxtop otuput, I added up the cmd/s to give me the 1250 number. As far as how many IOPS your datastore can handle, generally speaking a SATA disk can perform 50-80 IOPS (depending on the RPM of the device), and a scsi/fc can do 100-150, depending on the device. You only have two disks in your datastore, so your datastore can handle 100 - 300 IOPS total, before you have to start waiting for I/O.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

saravanan_ad · ‎05-25-2010

Thanks for your information. My harddisk speed is 10000 RPM/s. Sometime i can see some big number in disk I/O. for example you can see Win07,

249931 249931 Win02 (33) - 3 - 0 0 0 0 0 0.00 0.95 0.19 0.76 0.00 0.00

249932 249932 Win03 (32) - 3 - 0 0 0 0 0 0.00 0.76 0.38 0.38 0.01 0.00

249933 249933 Win04 (32) - 3 - 0 0 0 0 0 0.00 11.25 0.19 11.06 0.00 0.38

249934 249934 Win05 (31) - 3 - 0 0 0 0 0 0.00 9.16 1.91 7.25 0.02 0.07

249935 249935 Win06 (31) - 3 - 0 0 0 0 0 0.00 254.06 19.07 230.03 0.32 53.11

249936 249936 Win07 (29) - 3 - 0 0 0 0 0 0.00 3518437297766402048.00 3518437297766402048.00 0.57 3355443284761.95

249937 249937 Win08 (31) - 3 - 0 0 0 0 0 0.00 12.97 0.00 12.97 0.00 0.39

249938 249938 Win01 (2) - 3 - 0 0 0 0 0 0.00 16.98 16.98 0.00 0.81 0.00

another sample,

249938 249938 Win01 (2) - 3 - 0 0 0 0 0 0.00 3.10 0.00 3.10 0.00 0.02

249940 249940 Win03 (32) - 3 - 0 0 0 0 0 0.00 167.37 148.53 18.84 4.27 0.10

249941 249941 Win06 (31) - 3 - 0 0 0 0 0 0.00 2.62 0.00 2.62 0.00 0.02

249942 249942 Win05 (31) - 3 - 0 0 0 0 0 0.00 33.14 6.91 25.75 4194304010522.71 0.27

But this is coming in random seconds.

PacketRacer · ‎05-26-2010

You need to answer the question "what are my VMs waiting for" before you can answer the question "how do I reduce CPU wait time." Like KjB said, it's almost certainly storage!

Do this:

1) run esxtop

2) hit the 'u' key to go to the disk device screen

3) grab that output and post it

4) then hit the f key to show new fields

5) turn off B, C and G

6) turn on H; your "add / remove field" screen should look like this:

7) hit enter to go back to stats screen

😎 capture the output again and post it

Try to do this during a busy time, if possible. It's difficult to tell what's going on just by looking at a snapshot.

saravanan_ad · ‎05-26-2010

Hi,

The below snapshot has taken when the system is in busy state.

DEVICE	PATH/WORLD/PARTITION	NPH	NWD	NPN	DQLEN	WQLEN	ACTV	QUED	%USD	LOAD	CMDS/s	READS/s	WRITES/s	MBREAD/s	MBWRTN/s
vmhba1:0:0	-	1	4	9	128	0	0	0	0	0	53.6	0	53.6	0	0.72
vmhba1:1:0	-	1	61	2	32	0	12	0	37	0.38	1119.04	742.15	321.39	32.28	2.67
vmhba32:0:0	-	1	4	0	16	0	0	0	0	0	0	0	0	0	0

Here is the second table, after disabling B C G and enabling H

DEVICE	DQLEN	WQLEN	ACTV	QUED	%USD	LOAD	DAVG/cmd	KAVG/cmd	GAVG/cmd	QAVG/cmd	DAVG/rd	KAVG/rd	GAVG/rd	QAVG/rd	DAVG/wr	KAVG/wr	GAVG/wr	QAVG/wr
vmhba1:0:0	128	0	0	0	0	0	0.08	0.01	0.09	0	0	0	0	0	0.08	0.01	0.09	0
vmhba1:1:0	32	0	16	0	50	0.5	21.4	0.06	21.47	0.01	28.35	0.08	28.43	0.02	0.17	0.02	0.19	0
vmhba32:0:0	16	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

Thanks

kjb007 · ‎05-26-2010

From the output you posted:

You are trying to run 1119.04 (CMD/s), your ESX device queue, which for that LUN is 32, and is 50% used, so ESX is not queuing any data, but your device itself is adding 28.43ms of latency for every read (DAVG/rd), and the ESX host itself is adding very little to no latency from the vmkernel itself (KAVG)

So, your vm's are spending a lot of time waiting for read data, which is causing the problems you are seeing. You must address the storage before you will be able to get better performance.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

saravanan_ad · ‎05-26-2010

Thanks for your info. The task inside the vm is just launching a browser and load an urls on it. And i am not using any disk related activity in my task. Only one file i write into disk which is less than 100KB. Then why esxtop is showing this much huge disk activities? And what it is reading from disk where i dont have any read activities in my task. Can we get any details regarding disk read operation?

kjb007 · ‎05-26-2010

ESX can't provide you that data. You need to look inside the vm itself. Maybe antivirus, or something of that nature?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

All

How to tune esx3.5 cpu scheduling to fulfill my requirement