Have not seen that problem in our environment but would be interested as well if anyone has and what the cause may be.
hi mikemast !
very good question ! we also have a RHEL 3 32bit with an Oracle 184.108.40.206 that shows strange behaviour. performance graph shows cpu utilization of about 20% but inside the vm linux says 100% and most of it is in cpu state system.
system is extremely slow in performance.
no one knows exactly since when this happens. the vm come from a ESX 2.5.2 hot migrated to 3.0.1. Then we installed all 3.0.1 patches.
the vmware tools are not the actual ones. I need to update them.
if you have any news on this topic please let me know ! ...
Sorry I forgot.
We are also running in RHEL3 Guest OS. On an older Kernel, when we try to update to a newer Kernel we see some weird issues.
What Kernel are you on? That might be some of the issues. We feel back to the 32 kernel and allot of issues went away , when we were in 2.5.2.
Since the upgrade to 3.0.1 it only happens when I start vmware tools.
Check you kernel, the newer ones we have had issues with. And I have asked many questions at vmware about this and also at Red Hat, and no one has herd of any issues with the kernel versions causing an issue with Oracle 220.127.116.11
tomorrow at work I will check the kernel and also what will happen when turning on / off the tools.
I have migrated about 100 VM's online using dmotion. there are some other RHEL 3 32bit and RHEL 4 32/64 bit VM's running Oracle DB's. I think I will check their behaviour too.
It would also be interesting to mention the build# of the installed vmtools.
plain 3.0.1 is 32039
patch ESX-3199476 introduced 41412 (something with e1000)
patch ESX-5095559 introduced 43424 (memory leak?)
Unfortunately since 41412 the flexible behavior of the nic's is broken (at least in RHEL4, we filed a SR, VMware confirmed this is a bug) but that may not be an issue for you.
so today we had again trouble with this system.
every action concerning the oracle db resulted in massive load / high system cpu state situations.
The OS is RHEL 3 U8 x86, kernel 2.4.21-47.ELsmp, 4 GB RAM, 2 vCPU's
Oracle is 18.104.22.168 and VMware Tools have the version of 43424.
We have stopped the tools but the behaviour was the same.
With sar -A we could see that the load and high system cpu values increased for example at night when running the Oracle RMAN backup.
But this afternoon we decided to move that DB to a "new" dedicated Server HP Proliant DL 380 G4 with RHEL 4.5, x86_64.
The customer urgently needs a perfomant database ... :o(
We did an export of about 9 GB, which lasts about 4 hours \!!! ... and now a colleague is still working on creating the new db and doing the import.
Tomorrow I will try to have a look on the other RHEL VM's with Oracle on ESX 3.0.1.
When we stop tools it takes a few hours for the system to settle down, and respose goes back to normal.
We even had to bounce the apps tier to get back to normal respose time.
we didn't have the time to do further testing. but I keep the Oracle export and during the next days I will setup a fresh VM RHEL 4U4 or 4U5 x86_64 on ESX with Oracle 22.214.171.124 64bit.
We will create the same DB as on the HP Proliant and then we plan do to some performance testing.
good luck with your testing, hopefully it works out.
Hi all ,
we're experiencing the same performance problem with oracle. However our installation is on Windows X32.
I'm interested in any tips you can provide . I'll give a try disabling the vmware tools
sorry for delay. In the meantime I have created a VM with RHEL 4 U5 x86_64 and we installed Oracle 126.96.36.199 64bit, we ran the import and the customer checked his application and everything is running fine. So ... sorry, we didn't find the reason for that strange behaviour, but at the end we are happy to get up & running on a new VM.
good deal, good to hear everything worked out fine for you
Me again giving an update :
the problem was / is that we ran too much VM's in the HA DRS cluster.
So we had memory overcomittment.
What happens then ? ballooning is active in nearly every VM. This ended up
in oom (out of memory) killer problems because the oom killer killed big oracle processes.
So we disabled ballooning for this RHEL Linux VM's usind sched.mem.maxmemclt = 0.
So ballooning was inactive, but I forgot the fact that ESX will then swap out memory of the VM ... at this time we didn't had the performance graphs running. Now using VCenter 2.0.1 Patch 2 or 2.0.2 we have the graphs and we could see that massive swapping was done on this VM. ... :o( ... So finally I set the memory reservation Min = Max.
Now we have no ballooning and no swapping ... and the Oracle DB runs with sufficient performance.
In the meanwhile we also found that this VM resides on a mirrored LUN on a Hitachi SAN. The LUN is mirrored with Truecopy and this degrades performance and the LUN resides on the same RAID group where also big HP/UX Systems are doing heavy I/O on it ....
So now that we have installed some more ESX servers we don't have memory overcomittment and the actual SAN HDS Thunderbird 9500 will be replaced by a HP XP 24000 where we will do a detailed planning about LUN*s and hosts that need access to the SAN ...
But all that analyzing during the last months was very frustrating ...
In the meanwhile RHEL 4.5 is also supported by VMware ... and that makes me happy ;o)
Bye bye !
Are the balloon and swap parameters you used for each VM or for the entire ESX host?