VMware Cloud Community
saravanan_ad
Contributor
Contributor

How to tune esx3.5 cpu scheduling to fulfill my requirement

Hi,

I have installed ESX3.5 in Dell 2950 with following configuration.

RAM: 16GB

CPU: 2 Quadcore (2.33) Intel Xeon

Number of VMS: 17

Windows XP VMs: 16 (1vCPU, 512MB Ram)

RHEL5.3: 1 (1vCPU,2GB Ram)

ESX3.5 cpu scheduling algorithm working fine until all vms are

up and running all the time. But in my business requirement, we usually

revert minimum 4 vms(particularly windows vm) every 5 to 10 mins

interval. All vms have some task to do within 5mins which is cpu

intensive (50 - 100%) but not continuously.

The task inside the vm consume cpu resouce very randomly like (

25..4..0..3..23..67.0..0..0....78..98..100..100..100..100..2...23...3....3.22.11),

this number taken from bare machine. (collected from taskManager when

running task)

The ESX scheduler grabs all idle vms cpu whenever it goes below

10 or 3% and increases IDLE world % into 70 to 90. But vms are idle

only few seconds randomly. It then requires cpu heavily. In such

situation, scheduler not allocating enough cpu resources to required

vms eventhough it has idle cpu. So, all jobs are taking double the

amount of time to finish. (instead of completing 5 mins, it is taking

10 to 12 mins)

And moreover, when i revert the vm, the state of vms are

totally changed. It is fresh vm. Normally, esx cpu scheduler try to

allocate vm in the same physical cpu because physical cpu might have

caching of this reverted vm. But in my situation, scheduler no longer

need to maintain cpu affinity to allocate reverted vm in the same

physical cpu.

What I need is,

1. scheduler need to allocate cpu to a vm as soon as possible(without any delay).

2. Reduce accumulating Idle world (%USED) by the scueduler.

3. how do i need to tune cpu settings so that my job need to be finish same 5 mins.

Below is the statistics collected from esxtop, when vms are requesting more cpu,

8:04:24am up 108 days 16:38, 134 worlds; CPU load average: 0.34, 0.59, 0.60

PCPU(%): 86.14, 37.19, 44.54, 36.68, 40.62, 61.22, 65.86, 37.80 ; used total: 51.26

CCPU(%): 11 us, 71 sy, 17 id, 1 wa ; cs/sec: 2656

ID GID NAME NWLD %USED %RUN %SYS %WAIT %RDY %IDLE %OVRLP %CSTP %MLMTD

1 1 idle 8 356.50 361.61 0.00 0.00 404.71 0.00 0.20 0.00 0.00

2 2 system 6 0.22 0.21 0.01 574.13 0.43 0.00 0.00 0.00 0.00

6 6 helper 23 1.11 1.19 0.02 2198.88 3.38 0.00 0.10 0.00 0.00

7 7 drivers 11 0.05 0.05 0.00 1053.90 0.00 0.00 0.00 0.00 0.00

9 9 console 1 82.17 82.09 0.15 10.91 2.82 10.86 0.17 0.00 0.00

15 15 vmware-vmkauthd 1 0.00 0.00 0.00 95.82 0.00 0.00 0.00 0.00 0.00

16 16 Linux (27) 5 6.75 6.80 0.02 463.34 9.00 80.44 0.64 0.00 0.00

289025 289025 Win03 (26) 5 15.33 15.29 0.12 459.51 4.36 50.93 0.25 0.00 0.00

289055 289055 Win15 (43) 5 19.95 20.23 0.06 456.67 2.22 62.22 0.88 0.00 0.00

289056 289056 Win12 (26) 5 78.00 78.08 0.09 397.59 3.46 14.69 0.51 0.00 0.00

289057 289057 Win06 (26) 5 59.91 61.04 0.10 415.30 2.78 24.32 1.36 0.00 0.00

289058 289058 Win07 (26) 5 15.81 15.81 0.14 462.39 1.86 50.80 0.29 0.00 0.00

289059 289059 Win02 (26) 5 10.47 10.57 0.22 465.13 3.44 74.56 1.92 0.00 0.00

289060 289060 Win09 (26) 5 8.85 8.69 0.49 468.55 1.91 70.22 1.03 0.00 0.00

289061 289061 Win11 (26) 5 3.82 3.78 0.18 475.48 0.77 92.23 0.32 0.00 0.00

289062 289062 Win04 (26) 5 3.47 3.37 0.23 475.37 1.32 91.87 0.38 0.00 0.00

289063 289063 Win13 (26) 5 44.90 44.84 0.26 430.53 3.86 26.50 0.28 0.00 0.00

289064 289064 Win08 (26) 5 17.00 17.08 0.23 460.09 2.76 41.30 0.39 0.00 0.00

289065 289065 Win16 (26) 6 2.84 1.82 1.31 572.65 0.53 0.00 0.33 0.00 0.00

289066 289066 Win14 (26) 6 2.67 1.68 1.30 572.84 0.50 0.00 0.39 0.00 0.00

289067 289067 Win10 (26) 5 1.65 11.61 800.00 500.00 800.00 0.00 800.00 0.00 0.00

289068 289068 Win05 (26) 6 3.23 2.69 0.72 571.73 0.58 0.00 0.30 0.00 0.00

289069 289069 Win01 (27) 1 12.18 12.15 0.14 78.64 5.25 0.00 0.14 0.00 0.00

Advance thanks for your response.

0 Kudos
18 Replies
Linjo
Leadership
Leadership

Interesting reading, but IMHO its very little you can do to tune the scheduler by yourself.

Probably the best thing for you would be to upgrade to vSphere 4 where the scheduler have been very much improved.

Best regards,

Linjo

If you find this information useful, please award points for "correct" or "helpful".

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
saravanan_ad
Contributor
Contributor

Hi Linjo,

thanks for your response. But i cannot upgrade it to vSphere 4. Its deployed in serveral prod servers. I hope, there must be some good settings that we can tune the scheduler. All we need to do is just reduce accumulating idle world.

0 Kudos
saravanan_ad
Contributor
Contributor

For more information regarding this problem, the following table will help you.

ID

GID

NAME

NWLD

%USED

%RUN

%SYS

%WAIT

%RDY

%IDLE

%OVRLP

%CSTP

%MLMTD

1816010

291703

vmware-vmx

1

0.07

0.07

0

95.7

0.02

0

0.06

0

0

1816011

291703

vmm0:Win01_

1

8.41

8.36

0.25

86.74

0.69

71.65

1.05

0

0

1816012

291703

vmware-vmx

1

0

0

0

95.79

0

0

0

0

0

1816013

291703

mks:Win01(

1

0.26

0.26

0

95.1

0.43

0

0.01

0

0

1816014

291703

vcpu-0:Win0

1

0.05

0.05

0

95.74

0.01

0

0

0

0

So, we can see %WAIT time for particular vm is much high for all services (vmware-vmx,vmm0,mks etc).

Note: VM is not waiting for I/O. It is CPU intensive mostly.

0 Kudos
PacketRacer
Enthusiast
Enthusiast

Have you thought about using reservations? You could reserve a certain amount of Mhz for each VM. I've never used CPU reservations so I can't give you solid advice on it, but I would start with a value such as 500 Mhz or maybe 1000 Mhz and go up from there little bit little.

The drawback is that you only have 18.64 Ghz to give out, so you won't be able to reserve much more than 1 Ghz per VM if you want to keep all 17 VMs running.

You should also take a look at the Hyperthreading and Affinity sections of the Resource Management Guide (it was in the "Advanced" chapter) and see if any of that applies to you. There are situations where turning off Hyperthreading (in VMware's host configuration screen, and if you CPUs support it) can improve performance because there would be only one active VM per core at a time. Again, something you could experiment with.

Hope this helps!

0 Kudos
saravanan_ad
Contributor
Contributor

Hi,

Thank you PacketRacer. But i have already tried cpu reservation where each vms reserved min 1000MHz. But i didnt see any contention instead i saw low utilization. For example, if i didnt use reservation techniques, the overall cpu throughput was 49%. If i do so, the overall cpu throughput was below 25%. The cpu waiting time was increased enormously. This doesn't work for this scenario.

I also tried scheduling affinity to each vm. For example, every physical cpu(totally 8 cpus) was allocated with 2 vms. Instead of improving throughput, i saw reverse effect. So, i dropped all of my tinker work and let the esx3.5 scheduler to do its best way.

You can see above table, the cpu waiting time is too huge. Do you have any idea, how to reduce it?

0 Kudos
kjb007
Immortal
Immortal

I think you may be misinterpreting the esxtop statistics here.

Based on your output, your Win01_ vm is only spending 0.25% (%SYS) for some system services on behalf of this vm and 1.05%(%OVERLP) on behalf of other vm worlds. Most of your CPU time is spent in %WAIT (86.74%) , with 71.65% in IDLE. That leaves 86.74-71.65 = 15.09% spent in a wait for some resources, IO or other.

Sounds like you are running into a contention of some sort. Have you verified what kind of response you are getting from storage, etc?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
saravanan_ad
Contributor
Contributor

Hi,

The task inside the vm write result on file. is it going to take much i/o? I am not sure about disk i/o number in esxtop. Please use the below tables to identify where is the contension is.

ID

GID

NAME

NWLD

%USED

%RUN

%SYS

%WAIT

%RDY

%IDLE

%OVRLP

%CSTP

%MLMTD

1

1

idle

8

701.3

707.83

0

0

98.22

0

0.76

0

0

2

2

system

6

0.12

0.12

0

600

0.02

0

0

0

0

6

6

helper

23

0.41

0.46

0

2300

0.72

0

0.05

0

0

7

7

drivers

11

0.06

0.06

0

1100

0

0

0

0

0

9

9

console

1

5.12

5.05

0.11

85.22

10.49

85.17

0.08

0

0

15

15

vmware-vmkauthd

1

0

0

0

100

0

0

0

0

0

16

16

Linux

5

14.82

14.88

0

484.98

3.93

82.35

0.5

0

0

84679

84679

Win16

5

4.24

3.99

0.1

499.37

0.37

65.58

1.85

0

0

84689

84689

Win12

5

1.82

1.8

0.04

500

0.61

92.14

0.17

0

0

84693

84693

Win13

5

2.3

2.31

0.02

500

0.15

72.84

0.3

0

0

84695

84695

Win08

5

12.34

12.36

0.03

489.98

1.41

68.5

0.15

0

0

84698

84698

Win15

5

3.42

3.42

0.03

499.81

0.55

79.66

0.15

0

0

84699

84699

Win07

5

12.22

12.25

0.04

490.87

0.66

44.95

0.56

0

0

84700

84700

Win05

5

16.48

16.24

0.05

486.39

1.1

8.69

0.1

0

0

84701

84701

Win09

5

7.05

7.05

0.04

496.07

0.61

70.51

0.49

0

0

84702

84702

Win01

5

1.85

1.84

0.03

500

0.24

88.06

0.27

0

0

84703

84703

Win10

5

4.34

4.31

0.1

498.78

0.7

82.85

0.21

0

0

84704

84704

Win11

6

1.96

0.97

1.17

600

0.18

0

1.45

0

0

84705

84705

Win03

6

2.53

1.54

1.21

600

0.25

0

0.25

0

0

84706

84706

Win02

6

2.01

1.04

1.09

600

0.17

0

0.81

0

0

84707

84707

Win04

6

2.74

1.34

1.66

600

0.32

0

0.48

0

0

84708

84708

Win06

1

0.64

0.62

0.03

100

0.09

0

0.02

0

0

84709

84709

Win14

6

5.98

5.32

0.8

343.36

0.33

0

0.18

0

0

DISK I/O ESXTOP:

ID

GID

NAME

DEVICE

NWD

NDV

DQLEN

WQLEN

ACTV

QUED

%USD

LOAD

CMDS/s

READS/s

WRITES/s

MBREAD/s

MBWRTN/s

2

2

system

-

3

-

0

0

0

0

0

0

7.03

0

7.03

0

0.02

6

6

helper

-

3

-

0

0

0

0

0

0

44.57

0

0

0

0

9

9

console

-

1

-

0

0

0

0

0

0

222.86

127.69

95.17

0.26

0.26

15

15

vmware-vmkauthd

-

1

-

0

0

0

0

0

0

0

0

0

0

0

16

16

Linux

-

3

-

0

0

0

0

0

0

2.41

0

2.41

0

0.03

84679

84679

Win16

-

3

-

0

0

0

0

0

0

1.2

0

1.2

0

0.01

84689

84689

Win12

-

3

-

0

0

0

0

0

0

2.41

0.2

2.21

0

0.01

84690

84690

Win11

-

3

-

0

0

0

0

0

0

35.34

15.26

20.08

0.58

0.53

84691

84691

Win03

-

3

-

0

0

0

0

0

0

30.52

20.48

10.04

1.09

0.4

84692

84692

Win04

-

3

-

0

0

0

0

0

0

125.08

27.91

96.77

0.56

1.28

84693

84693

Win13

-

3

-

0

0

0

0

0

0

42.76

25.7

17.07

0.62

0.13

84694

84694

Win14

-

3

-

0

0

0

0

0

0

46.98

28.31

18.67

0.92

0.54

84695

84695

Win08

-

3

-

0

0

0

0

0

0

131.11

91.95

39.15

1.07

0.7

84696

84696

Win06

-

3

-

0

0

0

0

0

0

34.13

16.66

17.47

0.31

0.14

84697

84697

Win02

-

3

-

0

0

0

0

0

0

47.58

37.34

10.24

2.02

0.24

84698

84698

Win15

-

3

-

0

0

0

0

0

0

42.16

13.65

28.51

0.24

0.3

84699

84699

Win07

-

3

-

0

0

0

0

0

0

45.37

29.92

15.46

0.69

0.51

84700

84700

Win05

-

3

-

0

0

0

0

0

0

131.71

92.76

38.95

1.02

0.34

84701

84701

Win09

-

3

-

0

0

0

0

0

0

191.74

152.39

39.35

1.14

0.36

84702

84702

Win01

-

3

-

0

0

0

0

0

0

61.84

61.84

0

8.78

0

84703

84703

Win10

-

1

-

0

0

0

0

0

0

0

0

0

0

0

appreciate your response.

0 Kudos
kjb007
Immortal
Immortal

I would definitely validate your storage configuration. A quick look shows that the vm's with high %WAIT are also ones that have 100+ IOPS being attempted. Are all of these systems on the same datastore? If so, how large is the datastore, and how many disks back that datastore?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
saravanan_ad
Contributor
Contributor

Yes, all the vms are in same data store only. My datastore has two hard disks. One is allocated for esx (core) another one is for VMs(VMFS). The size of the disk is 275 GB. Each Windows VM size is 8GB including hard disk space. Linux VM size is 80GB. Around 210GB occupied by vms.

0 Kudos
kjb007
Immortal
Immortal

That is definitely the issue at hand. Two disks have between 150 (SATA) to 300 (FC) IOPS, and you are trying to force 1250. You need to add additional spindles, or add some additional datastores to get any further improvements.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
saravanan_ad
Contributor
Contributor

I am really sorry that i am not getting. How did you come with this number?

0 Kudos
kjb007
Immortal
Immortal

Looking at your esxtop otuput, I added up the cmd/s to give me the 1250 number. As far as how many IOPS your datastore can handle, generally speaking a SATA disk can perform 50-80 IOPS (depending on the RPM of the device), and a scsi/fc can do 100-150, depending on the device. You only have two disks in your datastore, so your datastore can handle 100 - 300 IOPS total, before you have to start waiting for I/O.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
saravanan_ad
Contributor
Contributor

Thanks for your information. My harddisk speed is 10000 RPM/s. Sometime i can see some big number in disk I/O. for example you can see Win07,

249931 249931 Win02 (33) - 3 - 0 0 0 0 0 0.00 0.95 0.19 0.76 0.00 0.00

249932 249932 Win03 (32) - 3 - 0 0 0 0 0 0.00 0.76 0.38 0.38 0.01 0.00

249933 249933 Win04 (32) - 3 - 0 0 0 0 0 0.00 11.25 0.19 11.06 0.00 0.38

249934 249934 Win05 (31) - 3 - 0 0 0 0 0 0.00 9.16 1.91 7.25 0.02 0.07

249935 249935 Win06 (31) - 3 - 0 0 0 0 0 0.00 254.06 19.07 230.03 0.32 53.11

249936 249936 Win07 (29) - 3 - 0 0 0 0 0 0.00 3518437297766402048.00 3518437297766402048.00 0.57 3355443284761.95

249937 249937 Win08 (31) - 3 - 0 0 0 0 0 0.00 12.97 0.00 12.97 0.00 0.39

249938 249938 Win01 (2) - 3 - 0 0 0 0 0 0.00 16.98 16.98 0.00 0.81 0.00

another sample,

249938 249938 Win01 (2) - 3 - 0 0 0 0 0 0.00 3.10 0.00 3.10 0.00 0.02

249940 249940 Win03 (32) - 3 - 0 0 0 0 0 0.00 167.37 148.53 18.84 4.27 0.10

249941 249941 Win06 (31) - 3 - 0 0 0 0 0 0.00 2.62 0.00 2.62 0.00 0.02

249942 249942 Win05 (31) - 3 - 0 0 0 0 0 0.00 33.14 6.91 25.75 4194304010522.71 0.27

But this is coming in random seconds.

0 Kudos
PacketRacer
Enthusiast
Enthusiast

You need to answer the question "what are my VMs waiting for" before you can answer the question "how do I reduce CPU wait time." Like KjB said, it's almost certainly storage!

Do this:

1) run esxtop

2) hit the 'u' key to go to the disk device screen

3) grab that output and post it

4) then hit the f key to show new fields

5) turn off B, C and G

6) turn on H; your "add / remove field" screen should look like this:

7) hit enter to go back to stats screen

😎 capture the output again and post it

Try to do this during a busy time, if possible. It's difficult to tell what's going on just by looking at a snapshot.

0 Kudos
saravanan_ad
Contributor
Contributor

Hi,

The below snapshot has taken when the system is in busy state.

DEVICE

PATH/WORLD/PARTITION

NPH

NWD

NPN

DQLEN

WQLEN

ACTV

QUED

%USD

LOAD

CMDS/s

READS/s

WRITES/s

MBREAD/s

MBWRTN/s

vmhba1:0:0

-

1

4

9

128

0

0

0

0

0

53.6

0

53.6

0

0.72

vmhba1:1:0

-

1

61

2

32

0

12

0

37

0.38

1119.04

742.15

321.39

32.28

2.67

vmhba32:0:0

-

1

4

0

16

0

0

0

0

0

0

0

0

0

0

Here is the second table, after disabling B C G and enabling H

DEVICE

DQLEN

WQLEN

ACTV

QUED

%USD

LOAD

DAVG/cmd

KAVG/cmd

GAVG/cmd

QAVG/cmd

DAVG/rd

KAVG/rd

GAVG/rd

QAVG/rd

DAVG/wr

KAVG/wr

GAVG/wr

QAVG/wr

vmhba1:0:0

128

0

0

0

0

0

0.08

0.01

0.09

0

0

0

0

0

0.08

0.01

0.09

0

vmhba1:1:0

32

0

16

0

50

0.5

21.4

0.06

21.47

0.01

28.35

0.08

28.43

0.02

0.17

0.02

0.19

0

vmhba32:0:0

16

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Thanks

0 Kudos
kjb007
Immortal
Immortal

From the output you posted:

You are trying to run 1119.04 (CMD/s), your ESX device queue, which for that LUN is 32, and is 50% used, so ESX is not queuing any data, but your device itself is adding 28.43ms of latency for every read (DAVG/rd), and the ESX host itself is adding very little to no latency from the vmkernel itself (KAVG)

So, your vm's are spending a lot of time waiting for read data, which is causing the problems you are seeing. You must address the storage before you will be able to get better performance.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
saravanan_ad
Contributor
Contributor

Thanks for your info. The task inside the vm is just launching a browser and load an urls on it. And i am not using any disk related activity in my task. Only one file i write into disk which is less than 100KB. Then why esxtop is showing this much huge disk activities? And what it is reading from disk where i dont have any read activities in my task. Can we get any details regarding disk read operation?

0 Kudos
kjb007
Immortal
Immortal

ESX can't provide you that data. You need to look inside the vm itself. Maybe antivirus, or something of that nature?

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos