VMware Cloud Community
Kake
Contributor
Contributor

Poor performance ESX 3.5 what to do...

Hello,

I have had for years now a single ESX 3.5 running for our development team. It has 14 64 bit or 32 bit Windows hosts. And until now there has been no problems.

Now I started to hear that the development environment is performing poorly. And checking on that I notice that the servers are actually running terribly slow. Windows is telling me that 100 % CPU is in use. VMWare peformance monitor is telling that only about 300 - 500 MHz is in use.

So I started to monitor. All servers have this same issue. CPU usage is 100 % yet VMWare says utilisation per CPU is under 30 %. I figured could it be due lot of Disk traffic. Disk traffic avg is 44 MBit/s... doesn't look that bad. I also checked network utilization it's hardy anything.

Now I have to wonder why my servers aren't getting resources from the ESX? Any ideas?

Current hardware

ProLiant BL460c G1

2 x Quad-Core Intel Xeon, 3000 MHz

RAM 16384 MB

HBA QLogic QMH2462 4Gb FC

Disk from (HP EVA 4000 FSCSI)

Tags (2)
0 Kudos
9 Replies
Virtualinfra
Commander
Commander

Hi Kake,

Take a ssh to your esx host and use command esxtop.

then click "C" and paste the result let see what is happening there and why the resources are not provided to VMs.

Regards

Note: Make sure you did all your checking from guest OS side. I mean think that the windows guest OS running on vmware is physical and it uses 100% CPU how would you trouble,  make sure that you have done basic trouble shooting from windows before you come to esx.

Thanks & Regards Dharshan S VCP 4.0,VTSP 5.0, VCP 5.0
0 Kudos
idle-jam
Immortal
Immortal

have a look at this, http://www.yellow-bricks.com/esxtop/ the value might help you.

0 Kudos
Kake
Contributor
Contributor

3:18:33pm up  1:56, 96 worlds; CPU load average: 0.04, 0.04, 0.07
PCPU(%):  20.58,   1.62,   1.32,   0.34,   0.05,   4.71,   0.15,   3.31 ;   used total:   4.01
CCPU(%):   1 us,   2 sy,  97 id,   0 wa ;       cs/sec:    342
Display ESX cpu on
     ID    GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT    %RDY
      1      1 idle                8  764.47  764.86    0.00    0.00   35.03
      2      2 system              6    0.01    0.01    0.00  599.87    0.00
      6      6 helper             22    0.03    0.03    0.00 2199.79    0.03
      7      7 drivers            14    0.00    0.00    0.00 1400.00    0.00
      9      9 console             1    4.15    3.97    0.00   81.67   14.36
     15     15 vmware-vmkauthd     1    0.00    0.00    0.00  100.00    0.00
     34     34 EllaDev64           5    1.84    1.81    0.08  497.86    0.36
     35     35 EllaDevDC01         6    3.11    3.11    0.00  596.47    0.44
     36     36 EllaDevApp          5    1.57    1.57    0.01  498.11    0.31
     37     37 EllaDev2k364        6    1.06    1.06    0.00  598.71    0.24
     38     38 EllaDevCRM          5    2.03    2.04    0.00  497.83    0.14
     39     39 EllaDevCRM2         6    4.82    4.77    0.16  595.16    0.03
     40     40 EllaDevCRM64        6    2.01    1.90    0.21  597.77    0.37
     41     41 SAPJPROD            5   10.80   10.79    0.09  488.69    0.55

Not all the machines are currently running.

We actually had another Blade-server with same set CPUs and half the memory. We put that one and booted from new hardware. Same results. So it is not hardware problem. Actually things were even worse. Guests were using CPU like 33 Mhz.

We also noticed that time to time we get normal CPU usage. We are hoping now that we can backup as much as we can.

0 Kudos
Virtualinfra
Commander
Commander

need the % MLMTD value and %CSTP value in esxtop.

after giving esxtop press f and select the value there and paste the information.

make sure all the virtual machine has selecte unlimited for CPU share value, you can check the by go to edit setting of the virtual machine and resources tab, click on CPU then right side bottom make sure the unlimited has check mark.

As we are able to see there is % RDY value for the which state the vm is not getting CPU resource.

Thanks & Regards Dharshan S VCP 4.0,VTSP 5.0, VCP 5.0
0 Kudos
Kake
Contributor
Contributor

Hello,

I've been monitoring this for some time now. We rebootted the ESX taking all the Guests up we ended up in similar sittuation.

Then we rebootted again. Now booting up Guests one by one. Yesterday we were able to get up to 5 Guests while still getting good peformance. After that the CPU usage crashed. This morning I took the number of Guests up to 10. Bootting up 11th and I got it to crash again. I took the 11th down and now I am able to keep steady peformance. I'll try to boot up the 11th guest in a moment to see if I can still get the same results.

While the CPU usage crashes the Guests says it's taking 100 % CPU. However it is taking under 100 MHz and is unable to get any more.

I took a screenshot from one Guest while it was unable to get CPU. Later I took shot of it's CPU usage on VMWare. As you can see there is over 10 minutes it was unable to get CPU. That is the periord 11th Guest was booted up. After shut down the 11th Guest the CPU usage returned to normal.

Here is what the esxtop looks atm.

9:01:35am up 16:33, 101 worlds; CPU load average: 0.20, 0.20, 0.17
PCPU(%):  11.67,  29.55,  31.41,  15.74,  12.15,   8.70,  13.24,  10.33 ;   used total:  16.60
CCPU(%):   1 us,   1 sy,  98 id,   0 wa ;       cs/sec:    832

NAME              %USED    %RUN    %SYS   %WAIT    %RDY   %IDLE  %OVRLP   %CSTP  %MLMTD
idle             667.74  670.05    0.00    0.00  129.92    0.00    0.00    0.00    0.00
system             0.12    0.12    0.00  599.85    0.00    0.00    0.00    0.00    0.00
helper             0.01    0.02    0.00 2199.85    0.03    0.00    0.00    0.00    0.00
drivers            0.01    0.01    0.00 1299.96    0.00    0.00    0.00    0.00    0.00
console            4.12    4.21    0.01   93.25    2.54   92.55    0.97    0.00    0.00
vmware-vmkauthd    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00    0.00
EllaDev64         29.72   29.74    0.08  665.75    4.48  166.42    0.64    0.00    0.00
EllaDevDC01        3.79    3.79    0.00  595.45    0.74   95.68    0.21    0.00    0.00
SAPJPROD           2.72    2.65    0.09  497.17    0.17   89.76    0.09    0.00    0.00
SAPJAPP2003        5.44    5.47    0.01  493.41    1.09   88.14    0.45    0.00    0.00
EllaDevOulu       71.80   71.70    0.53  527.05    1.23   28.03    0.67    0.00    0.00
EllaDevOra         3.13    3.10    0.05  496.78    0.11   92.61    0.06    0.00    0.00
EllaDevApp         1.14    1.01    0.26  498.83    0.16    0.00    0.18    0.00    0.00
EllaDev2k364       6.48    6.42    0.11  592.99    0.61   93.47    0.17    0.00    0.00
EllaDevCRM         1.73    1.73    0.01  498.14    0.12   96.46    0.07    0.00    0.00

I'll do more tests after people using these virtual machines get some work done :smileygrin:

0 Kudos
Scissor
Virtuoso
Virtuoso

Are any of your Guests configured with more than 1 vCPU?

0 Kudos
Kake
Contributor
Contributor

No, we had one vCPU per Guest when this problem first emerged. Now we have experimented on adding another CPU to guest. Results have been really poor. We are currently able to use max 30 - 40 % of that CPU capacity.

I ran Super Pi on one of our Guests. 32 MB run in 44 minutes. On my laptop that takes 29 minutes. But what I noticed was that the guest was able to get that around 3 GHz CPu until it drops down to nothing. Then for 5-10 minutes there is no CPU available. After that CPU usage started to rise again. And it did that a few times. So CPU performance looks like rollercoaster on VMWare performance monitor. During that CPU cap I noticed no %RDY in esxtop.

After adding another CPU to guest the %CSTP start to rise upto 30 - 60 %

The %MLMTD that I forgot to post last time is steady 0.00

I've been wondering the %IDLE and %WAIT levels. For this one Guest I was checking on I noticed %USED was low. But at the same time %IDLE was 0.00 % and %WAIT was 600+ %. When %USED rise up to say 40+ % the %IDLE also went up.

7:51:48am up 8 days 15:23, 115 worlds; CPU load average: 0.31, 0.29, 0.28

ADAPTR CID TID LID  WID NCHNS NTGTS NLUNS NVMS AQLEN LQLEN WQLEN ACTV QUED %USD  LOAD   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s
vmhba1   -   -   -    -     1     2    28  626  4096     0     0   -    -    -     -    315.35   125.98   189.37     0.57     2.57
vmhba2   -   -   -    -     1     2    28  626  4096     0     0   -    -    -     -      0.00     0.00     0.00     0.00     0.00

MBREAD/s and MBWRTN/s are pretty low.

DAVG/cmd is between 0.7 to 4.3 so that seems pretty good.

0 Kudos
bulletprooffool
Champion
Champion

Hi Kake,

Invariably, ESX performance issues source from 1 of 4 locations:

  • Memory
  • CPU
  • Disk Latency
  • Network

Memory and CPU availability is relatively easy to check from the performance tab (though of course using ESXTOP will give you more info)

More often that not I have found that performance issues are related to Storage though.

Have a look at this post:

http://networkadminkb.com/kb/Knowledge%20Base/VMWare/How%20to%20fix%20Slow%20File%20Systems%20on%20V...

Could you also answer a few questions to help us here

Are all VMs performing poorly, or just certain ones (perhaps 64bit VMs)?

What OS are the VMs that are behaving poorly?

Are VMs behaving poorly on all ESX hosts?

Is there any correlation between poorly performing VMs (OS / 32 or 64 bit/ Storage)

One day I will virtualise myself . . .
0 Kudos
Kake
Contributor
Contributor

Thanks for the new info. I'll read the artice, but first answer those questions you have.

When we first noticed the problem all our virtual machines were having this problem. Basically the entire system was down. Now we are able to run atleast two guests with normal operations. The trick was to shut down ESX. Boot up and keep Guests down. Then slowly boot up Guest up one by one. After 5 Guest were up we started to notice the caps again.

We have both 64 bit and 32 bit Windows 2kx servers. And both have same problem. I attached CPU graph from 3 example Guests. You can see from the two that they are currently having a problem. The guest in bottom of the pic is working. fine. But if I'd have to say if 32bit are working better than 64 bit guests, I would say that it is even worse for the 64 bit systems.

We aren't noticing any problems with our other ESX servers. Unfortunately we do not have vCenter or vMotion that we could move the virtual machines.

I am planning on cloning one of the more important Guests to another ESX to enable people do some work.

I have pretty good spare hardware avalable. I wonder if I could set that up to ESXi and share the Disks with our problem ESX. Then move some Guests to that.

0 Kudos