Hello, all,
We are experiencing some serious problems on a virtualized Oracle RAC 10.2.0.4.0, standard edition, built on two Windows 2003 virtual machine hosted on a VSphere-running 3-server physical cluster.
Win2003 machine are configured this way:
OS: WinX64 Workstation Release 2
4 CPU, 8 GB Ram
Two ESXI3 network adapters
a 70-gig system disk.
Every virtual host is on its own server.
Cluster disks are configured as follow:
Data disk: 400gig under SCSI 1.1, physical, permanent independent
Arc Log: 420gig under SCSI 1.2, physical, permanent independent
Redo Log: 20gig under SCSI 1.3, physical, permanent independent
1 OCR Disk (Scsi 2.1), physical, permanent, independent
5 Voting Disks, (Scsi 2.2 to Scsi 2.8), physical, permanent, independent
All of these virtual disks have been created using vmkfstools, using eagerzeroedthick options.
Disk locking is false, and, according to best practices found online, we set up these extra parameters on vmx files:
diskLib.dataCacheMaxReadAheadSize = "0"
diskLib.dataCacheMinReadAheadSize = "0"
diskLib.dataCachePageSize = "4096"
diskLib.maxUnsyncedWrites = "0"
Data, Arc and Redo are used by Oracle ASM: all diskgroups have been configured using external redundancy. All virtual disks are placed on RAID-1 LUNS.
I enjoyed a flawless Oracle Stack installation; the same can be said for ASM and DB Instance creation, and we all know how tricky RAC installation can be.
When I imported data from the old RAC, data exported using plain old Exp, I suddenly experienced these errors, leading to failed import and sudden instance shutdown.
-
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lgwr_3020.trc:
ORA-00345: redo log write error block 247036 count 2048
ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'
ORA-27070: async read/write failed
ORA-00345: redo log write error block 247036 count 2048
ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'
ORA-27070: async read/write failed
ORA-00345: redo log write error block 249084 count 1
ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'
ORA-27070: async read/write failed
ORA-00345: redo log write error block 249084 count 1
ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'
ORA-27070: async read/write failed
Tue Jun 01 18:59:08 2010
Doing block recovery for file 7 block 329099
Tue Jun 01 18:59:08 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lgwr_3020.trc:
ORA-00340: IO error processing online log 5 of thread 1
ORA-00345: redo log write error block 247036 count 2048
ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'
ORA-27070: async read/write failed
ORA-00345: redo log write error block 247036 count 2048
ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'
ORA-27070: async read/write failed
ORA-00345: redo log write error block 249084 count 1
ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'
ORA-27070: async read/write failed
ORA-00345: redo log write error block 249084 count 1
ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'
ORA-27070: async read/write failed
Tue Jun 01 18:59:08 2010
LGWR: terminating instance due to error 340
Tue Jun 01 18:59:08 2010
Doing block recovery for file 7 block 329099
Tue Jun 01 18:59:08 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lms1_2100.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:08 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lmd0_1468.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:08 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lms0_2540.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:08 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lmon_824.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:08 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_pmon_3332.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:09 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_ckpt_2400.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:09 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_j000_4060.trc:
ORA-00340: errore IO elaborando log in linea del thread
Tue Jun 01 18:59:09 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_psp0_1596.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:09 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_q000_3656.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:09 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lck0_3088.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:10 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_rbal_2092.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:10 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_mman_844.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:10 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_dbw0_644.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:11 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_o000_3292.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:17 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_reco_916.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:17 2010
Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_smon_672.trc:
ORA-00340: IO error processing online log of thread
Tue Jun 01 18:59:18 2010
Instance terminated by LGWR, pid = 3020
-
Moreover, I've spotted the same ORA-27070 error when importing a large table and creating the index on it.
-
IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 1115:
"CREATE INDEX "CRI10" ON "CRI_ORDINI_RIGHE" ("CLICTE_CRI" , "KEY_CRI" , "ARC"
"ART_VEN_CRI" , "ARCAAC_VEN_CRI" , "ARCARD_VEN_CRI" , "ARCARS_VEN_CRI" , "AR"
"CART_PRO_CRI" , "ARCAAC_PRO_CRI" , "ARCARD_PRO_CRI" , "ARCARS_PRO_CRI" , "D"
"T_CON_CHIESTA_CRI" , "NUMERO_RIGA_CRI" , "QT_ORDINATA_CRI" ) PCTFREE 10 IN"
"ITRANS 2 MAXTRANS 255 STORAGE(INITIAL 234881024 FREELISTS 1 FREELIST GROUPS"
" 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING"
IMP-00003: rilevato errore ORACLE 1115
ORA-01115: errore I/O leggendo blocco su file 7 (blocco n. 402013)
ORA-01110: file di dati 7: '+DATA/rucla/datafile/users.270.720554561'
ORA-27070: lettura/scrittura asincrona non riuscita
IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 20000:
"BEGIN DBMS_STATS.SET_INDEX_STATS(NULL,'"CRI10"',NULL,NULL,NULL,1831008,264"
"52,1831008,1,1,240554,2,6); END;"
IMP-00003: rilevato errore ORACLE 20000
ORA-20000: INDEX "RUBRA"."CRI10" does not exist or insufficient privileges
ORA-06512: a "SYS.DBMS_STATS", line 2124
ORA-06512: a "SYS.DBMS_STATS", line 5473
ORA-06512: a line 1
IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 1115:
"CREATE INDEX "CRI00" ON "CRI_ORDINI_RIGHE" ("KEY_CRI" , "ST_MODIFICA_CRI" )"
" PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 75497472 FREELISTS 1 F"
"REELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING"
IMP-00003: rilevato errore ORACLE 1115
ORA-01115: errore I/O leggendo blocco su file 7 (blocco n. 402845)
ORA-01110: file di dati 7: '+DATA/rucla/datafile/users.270.720554561'
ORA-27070: lettura/scrittura asincrona non riuscita
IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 20000:
"BEGIN DBMS_STATS.SET_INDEX_STATS(NULL,'"CRI00"',NULL,NULL,NULL,2010220,880"
"8,2010220,1,1,227401,2,6); END;"
IMP-00003: rilevato errore ORACLE 20000
ORA-20000: INDEX "RUBRA"."CRI00" does not exist or insufficient privileges
ORA-06512: a "SYS.DBMS_STATS", line 2124
ORA-06512: a "SYS.DBMS_STATS", line 5473
ORA-06512: a line 1
-
On physical machines, these errors are tell-tale signs of disk problems coming, either bad sectors or controller about to crash. But what about virtual machines? I googled a lot to get help about this.
The folks that set up ESX infrastructure, have been, so far, of no or little help. We' ve opened a call with hardware manufacturer, We're still waiting for the answer.
In short, we're stuck, since we followed all of the best practices we could lay our hand on.
Any help?
Thanks in advance,
Max Lambertini
Hi,
Welcome at the VMware communities forum.
Have you opened a support ticket with VMware?
They will probably be of better help as just the people in the forum here.
Is this a production server, or is it still in test?
I think it will be helpful if you attached the vmware.log files from your two windows virtual machines hosting the RAC cluster as that helps to see if there are visible problems at the virtualisation layer.
--
Wil
_____________________________________________________
VI-Toolkit & scripts wiki at http://www.vi-toolkit.com
Contributing author at blog www.planetvm.net
Twitter: @wilva
Hello,
This is a .vmx file from a similar cluster which, until recently, gave no sign of this behaviour . I said until recently because I received some reports about cluster failure. I also enclose a trace file of a similar error -- ORA-15080, which is a failed synchronous IO operation on a disk.
Thanks again for support,
Max
Hi Max,
The vmx file doesn't personally help me much, nor does that stack trace, maybe other people who are following this post?
I'd really need a vmware.log file in order to see what is happening on the virtual layer. If you don't want to attach it, you can always try to pm it to me or if a private message doesn't take attachments, then use the email address I describe in my profile.
--
Wil
_____________________________________________________
VI-Toolkit & scripts wiki at http://www.vi-toolkit.com
Contributing author at blog www.planetvm.net
Twitter: @wilva
Hi,
In my test installing RAC over VMware Server 2 (tested with Oracle 10gR1, 10gR2 and 11gR1), I used this vmx sample:
config.version = "8"
virtualHW.version = "4"
scsi0.present = "TRUE"
scsi0.virtualDev = "lsilogic"
memsize = "700"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "localdisk.vmdk"
ide1:0.present = "TRUE"
ide1:0.fileName = "auto detect"
ide1:0.deviceType = "cdrom-raw"
floppy0.fileName = "A:"
Ethernet0.present = "TRUE"
displayName = "racnode1"
guestOS = "rhel4"
priority.grabbed = "normal"
priority.ungrabbed = "normal"
disk.locking = "FALSE"
diskLib.dataCacheMaxSize = "0"
scsi1.sharedBus = "virtual"
scsi1.present = "TRUE"
scsi1:0.present = "TRUE"
scsi1:0.fileName = "D:\vm\rac\sharedstorage\ocfs2disk.vmdk"
scsi1:0.mode = "independent-persistent"
scsi1:0.deviceType = "disk" scsi1:1.present = "TRUE"
scsi1:1.fileName = "D:\vm\rac\sharedstorage\asmdisk1.vmdk"
scsi1:1.mode = "independent-persistent"
scsi1:1.deviceType = "disk" scsi1:2.present = "TRUE"
scsi1:2.fileName = "D:\vm\rac\sharedstorage\asmdisk2.vmdk"
scsi1:2.mode = "independent-persistent"
scsi1:2.deviceType = "disk" scsi1:3.present = "TRUE"
scsi1:3.fileName = "D:\vm\rac\sharedstorage\asmdisk3.vmdk"
scsi1:3.mode = "independent-persistent"
scsi1:3.deviceType = "disk" scsi1.virtualDev = "lsilogic"
ide1:0.autodetect = "TRUE"
floppy0.present = "FALSE"
Ethernet1.present = "TRUE"
Ethernet1.connectionType = "hostonly"
Another questions:
- One best practice is have 2 OCR.
- RAID 1 doesn't perform wery well with I/O intensive apps like DBs. You're may have performance issues when you're making intensive writes. Better options can be RAID 0+1, RAID 6 or 6 DP, RAID 5... From better to worst. REDO and Controlfiles, at least, should be in the faster disk you can find. Archivelog and datafiles can be in RAID 5, of course, if datafiles aren't I/O intensive. Your trace file shows like final error "requested resource in use" in sync and async, maybe your RAID configuration is causing contention on writes when you import the data (write intensive ops, like creating indexes or taking statistics..).
Cheers,
Paulo.