We installed a CentOS 5.5 guest postgresql database server in vSphere 4.
This guest "verik" (2 vCpu, 16gb RAM, PAE kernel) is the only guest running on the vSphere host (2x6 core AMD Opteron 2439SE, 128gb RAM).
We are seeing very poor performance (for example when running a pg_dumpall). The %CSTP seems quite high, especially compared to %RUN -- and surprising since there is no contention whatsoever for host resources.
9:38:02am up 17:54, 131 worlds; CPU load average: 0.01, 0.02, 0.01
PCPU USED(%): 11.2 0.6 0.4 0.3 0.3 0.2 0.4 2.1 0.3 0.3 2.9 0.1 AVG: 1.6
PCPU UTIL(%): 11.4 1.1 0.9 0.9 0.8 0.6 0.9 2.5 0.7 0.6 3.1 0.2 AVG: 2.0
CCPU(%): 7 us, 2 sy, 91 id, 0 wa ; cs/sec: 509
ID GID NAME NWLD %USED %RUN %SYS %WAIT %RDY %IDLE %OVRLP %CSTP %MLMTD
1 1 idle 12 1181.67 1184.71 0.00 0.00 15.87 0.00 2.47 0.00 0.00
2 2 system 7 0.01 0.01 0.00 700.00 0.00 0.00 0.00 0.00 0.00
6 6 helper 75 0.16 0.16 0.00 7500.00 0.05 0.00 0.00 0.00 0.00
7 7 drivers 9 0.01 0.01 0.00 900.00 0.00 0.00 0.00 0.00 0.00
8 8 vmotion 4 0.00 0.00 0.00 400.00 0.00 0.00 0.00 0.00 0.00
10 10 console 2 10.59 10.83 0.02 189.27 0.00 89.19 0.35 0.00 0.00
15 15 vmkapimod 7 0.03 0.03 0.00 700.00 0.00 0.00 0.00 0.00 0.00
17 17 FT 1 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00
18 18 vobd.4261 8 0.00 0.00 0.00 800.00 0.00 0.00 0.00 0.00 0.00
19 19 net-cdp.4269 1 0.00 0.00 0.00 100.00 0.00 0.00 0.05 0.00 0.00
20 20 vmware-vmkauthd 1 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00
26 26 verik 4 5.12 4.75 0.35 348.10 0.05 18.24 0.51 47.30 0.00
We have tried a variety of 2.6.18 PAE kernels (including attempts with the CONFIG_HZ 100 and with divider=10 options). More or less consistently poor performance.
Host BIOS Settings are:
HyperTransport Technology: HT 3
HT Assist: Enabled
Virtualization Technology: Enabled
DRAM Prefetcher: Enabled
Hardware Prefetch Training on Software Prefetch: Enabled
Hardware Prefetcher: Enabled
Demand-Based Power Management: Disabled
We have also tried running this guest with 1 vCPU (2.6.18-194.8.1.el5PAE):
The first time run after a reboot:
time pg_dumpall>/dev/null
real 8m1.764s
user 0m12.649s
sys 0m3.133s
While this is running, the system is not responsive (e.g., a "top" that should update every second may update every 7 seconds). The guest is spending a lot of time in "system", doing what we're not sure.
Cpu(s): 17.3%us, 81.9%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.8%si, 0.0%st
This takes 7 minutes on a multi-core physical server (4x2.8Ghz Xeon).
This takes 17 minutes at best (often much longer) on "verik" with 2 vCPUs.
We'd really like to migrate our physical postgres server into VMware, but we cannot justify doing so unless it can perform at least as well as on physical hardware. (We have tried storing the database on local disk, NFS mount, and host datastore; similar performance in all cases. We have also verified adequate bandwidth on our links to storage and front-end networks.)
We have reviewed many postings on running guests with more than 1 vCpu and think we understand how vSphere CPU coscheduling works. But in our scenario, with no contention for CPU resources, we're not sure why we're having performance issues.
Any suggestions?
Thanks!
Craig
First of, why 32-bit OS for VM with 16 GBs of RAM?
My first and very strong suggestion is to go for 64-bit version of CentOS, PAE is known to have poor performance. We run several 64-bit CentOS VMs and I can't say that we are having any unexplained performance problems.
You could try to tweak existing 32-bit OS a bit, but I do not expect big improvement. Keep that divider=10 in your kernel parameters but if you like try again with single CPU add "nosmp noapic nolapic" kernel parameters, those can drop CPU rdy time % a quite bit, at least I have seen it to happen on busy ESX host. Also assign some huge memory pages in Linux kernel and configure PostgreSQL to use them.
Your BIOS settings seem also to be non-optimal, general recommendation is to disable CPU prefetching features on ESX hosts since when CPU is executing multiple processes (VMs) prefetching results in high number of misses which is just waste of CPU cycles.
Hardware Prefetch Training on Software Prefetch: disabled
Hardware Prefetcher: disabled
But still, go for 64-bit OS.
Any suggestions?
Yes you can't compare physical with virtual.
Try reducing the number of CPU in that VM to 1. Just TRY it.
First of, why 32-bit OS for VM with 16 GBs of RAM?
My first and very strong suggestion is to go for 64-bit version of CentOS, PAE is known to have poor performance. We run several 64-bit CentOS VMs and I can't say that we are having any unexplained performance problems.
You could try to tweak existing 32-bit OS a bit, but I do not expect big improvement. Keep that divider=10 in your kernel parameters but if you like try again with single CPU add "nosmp noapic nolapic" kernel parameters, those can drop CPU rdy time % a quite bit, at least I have seen it to happen on busy ESX host. Also assign some huge memory pages in Linux kernel and configure PostgreSQL to use them.
Your BIOS settings seem also to be non-optimal, general recommendation is to disable CPU prefetching features on ESX hosts since when CPU is executing multiple processes (VMs) prefetching results in high number of misses which is just waste of CPU cycles.
Hardware Prefetch Training on Software Prefetch: disabled
Hardware Prefetcher: disabled
But still, go for 64-bit OS.
One thing to keep in mind when looking at multiple vCPU's is if the app is truly multi-threaded or not. I assume Postgre is (my DB ignorance showing there, sorry). If so, that's likely not an issue.
Also, is this PAE kernel capable of SMP? I know that's like asking "is it plugged-in", but gotta ask. I see that a lot when folks around here have issues with CPU, believe it or not.
Geez - I stand to learn a bit about Postgre and PAE form all of this...
..out of office message deleted..
Thanks for your reply. Your suggestions helped a lot.
We installed a fresh CentOS 5.5 64-bit guest (1 CPU) (and made the two BIOS changes you recommended), and a pg_dump that previously took hours finished in just over a minute. That's the kind of performance we were expecting. I would not have guessed a 32-bit PAE CentOS 5.5 would have performed so much worse...
We are not using the "divider=10" and "nosmp noapic nolapic" kernel boot parameters. I wasn't sure from your comments if you used these with just the 32-bit kernel or if you use them with your 64-bit kernel also...? According to the VMware software compatibility notes, no kernel options should be needed at all with CentOS 5.4 (and I would imagine 5.5).
Thanks again for your feedback!
(changed my Communities username)
Well VMware documentation of Linux kernel parameters is about time keeping best practices, and for that documentation is correct, no additional parameters are required to archive accurate clock. Kernel parameters "nosmp noapic nolapic" work with uniprocessor 64-bit Linux VMs also, and setting those will improve performance at some degree, just remember to remove them once you see need to go for SMP.