Re: Oracle issues in 3.0.1

mikemast · ‎05-25-2007

I migrated a production Oracle DB server from 2.5.2 to 3.0.1. We have been using this VM for years, running a 9.2.0.6 Database. It ran great in 2.5.2, and runs OK in 3.0.1 with VmWare tools turned off.

I turn on VmWare tools in the afternoon and the next morning users are complaining of slowness with the system, even thought the CPU and Memory usage are fine.

I turn off the Vmware tools and after an hour or so the users are saying the response time is back to normal. I have done this at least 2 times to verify that is is Vmware tools that is causing the issue.

Anyone else had this issue with vmware tools and Oracle.

Thanks Mike

petedr · ‎05-25-2007

Have not seen that problem in our environment but would be interested as well if anyone has and what the cause may be.

www.thevirtualheadline.com www.liquidwarelabs.com

marvinthebassma · ‎05-26-2007

hi mikemast !

very good question ! we also have a RHEL 3 32bit with an Oracle 9.2.0.7 that shows strange behaviour. performance graph shows cpu utilization of about 20% but inside the vm linux says 100% and most of it is in cpu state system.

system is extremely slow in performance.

no one knows exactly since when this happens. the vm come from a ESX 2.5.2 hot migrated to 3.0.1. Then we installed all 3.0.1 patches.

the vmware tools are not the actual ones. I need to update them.

if you have any news on this topic please let me know ! ...

mikemast · ‎05-26-2007

Sorry I forgot.

We are also running in RHEL3 Guest OS. On an older Kernel, when we try to update to a newer Kernel we see some weird issues.

What Kernel are you on? That might be some of the issues. We feel back to the 32 kernel and allot of issues went away , when we were in 2.5.2.

Since the upgrade to 3.0.1 it only happens when I start vmware tools.

Check you kernel, the newer ones we have had issues with. And I have asked many questions at vmware about this and also at Red Hat, and no one has herd of any issues with the kernel versions causing an issue with Oracle 9.2.0.6

marvinthebassma · ‎05-28-2007

hi,

tomorrow at work I will check the kernel and also what will happen when turning on / off the tools.

I have migrated about 100 VM's online using dmotion. there are some other RHEL 3 32bit and RHEL 4 32/64 bit VM's running Oracle DB's. I think I will check their behaviour too.

wally · ‎05-28-2007

It would also be interesting to mention the build# of the installed vmtools.

plain 3.0.1 is 32039

patch ESX-3199476 introduced 41412 (something with e1000)

patch ESX-5095559 introduced 43424 (memory leak?)

Unfortunately since 41412 the flexible behavior of the nic's is broken (at least in RHEL4, we filed a SR, VMware confirmed this is a bug) but that may not be an issue for you.

marvinthebassma · ‎05-29-2007

hi again,

so today we had again trouble with this system.

every action concerning the oracle db resulted in massive load / high system cpu state situations.

The OS is RHEL 3 U8 x86, kernel 2.4.21-47.ELsmp, 4 GB RAM, 2 vCPU's

Oracle is 9.2.0.5 and VMware Tools have the version of 43424.

We have stopped the tools but the behaviour was the same.

With sar -A we could see that the load and high system cpu values increased for example at night when running the Oracle RMAN backup.

But this afternoon we decided to move that DB to a "new" dedicated Server HP Proliant DL 380 G4 with RHEL 4.5, x86_64.

The customer urgently needs a perfomant database ... :o(

We did an export of about 9 GB, which lasts about 4 hours \!!! ... and now a colleague is still working on creating the new db and doing the import.

Tomorrow I will try to have a look on the other RHEL VM's with Oracle on ESX 3.0.1.

mikemast · ‎05-29-2007

When we stop tools it takes a few hours for the system to settle down, and respose goes back to normal.

We even had to bounce the apps tier to get back to normal respose time.

marvinthebassma · ‎05-30-2007

hi mikemast,

we didn't have the time to do further testing. but I keep the Oracle export and during the next days I will setup a fresh VM RHEL 4U4 or 4U5 x86_64 on ESX with Oracle 9.2.0.8 64bit.

We will create the same DB as on the HP Proliant and then we plan do to some performance testing.

petedr · ‎06-01-2007

good luck with your testing, hopefully it works out.

www.thevirtualheadline.com www.liquidwarelabs.com

sc_21111 · ‎06-04-2007

Hi all ,

we're experiencing the same performance problem with oracle. However our installation is on Windows X32.

I'm interested in any tips you can provide . I'll give a try disabling the vmware tools

Thanks

marvinthebassma · ‎07-16-2007

Hi guys,

sorry for delay. In the meantime I have created a VM with RHEL 4 U5 x86_64 and we installed Oracle 9.2.0.8 64bit, we ran the import and the customer checked his application and everything is running fine. So ... sorry, we didn't find the reason for that strange behaviour, but at the end we are happy to get up & running on a new VM.

Martin

petedr · ‎07-16-2007

good deal, good to hear everything worked out fine for you

www.thevirtualheadline.com www.liquidwarelabs.com

marvinthebassma · ‎08-01-2007

Me again giving an update :

the problem was / is that we ran too much VM's in the HA DRS cluster.

So we had memory overcomittment.

What happens then ? ballooning is active in nearly every VM. This ended up

in oom (out of memory) killer problems because the oom killer killed big oracle processes.

So we disabled ballooning for this RHEL Linux VM's usind sched.mem.maxmemclt = 0.

So ballooning was inactive, but I forgot the fact that ESX will then swap out memory of the VM ... at this time we didn't had the performance graphs running. Now using VCenter 2.0.1 Patch 2 or 2.0.2 we have the graphs and we could see that massive swapping was done on this VM. ... :o( ... So finally I set the memory reservation Min = Max.

Now we have no ballooning and no swapping ... and the Oracle DB runs with sufficient performance.

In the meanwhile we also found that this VM resides on a mirrored LUN on a Hitachi SAN. The LUN is mirrored with Truecopy and this degrades performance and the LUN resides on the same RAID group where also big HP/UX Systems are doing heavy I/O on it ....

So now that we have installed some more ESX servers we don't have memory overcomittment and the actual SAN HDS Thunderbird 9500 will be replaced by a HP XP 24000 where we will do a detailed planning about LUN*s and hosts that need access to the SAN ...

But all that analyzing during the last months was very frustrating ...

In the meanwhile RHEL 4.5 is also supported by VMware ... and that makes me happy ;o)

Bye bye !

Marvin

KlinikenLB · ‎08-01-2007

Are the balloon and swap parameters you used for each VM or for the entire ESX host?

petedr · ‎08-01-2007

Wow you definitely went through a lot of troubleshooting but, good to hear you find out the answers.

www.thevirtualheadline.com www.liquidwarelabs.com

marvinthebassma · ‎08-01-2007

Hi,

no. We have a mixed environment running about 350 VM's ( different RHEL and Windows versions).

We only changed the memory reservation for some "critical" RHEL + Oracle VM's because that were the only VM's where those problems occured.

With Windows there were no ballooning problems so we kept all our default values.

The parameter for disabling ballooning is not needed if I set memory reservation Min = Max.

In addition we temporarily disabled tranparent memory sharing but I can not give you an answer if the VM is running "better" than with memory sharing.

Marvin

All

Oracle issues in 3.0.1