VMware Cloud Community
maxlambertini
Contributor
Contributor

Widespread ORA-27070 error on Oracle RAC 10.2.0.4.0 / Win2003 X64 on VSPhere

Hello, all,

We are experiencing some serious problems on a virtualized Oracle RAC 10.2.0.4.0, standard edition, built on two Windows 2003 virtual machine hosted on a VSphere-running 3-server physical cluster.

Win2003 machine are configured this way:

OS: WinX64 Workstation Release 2

4 CPU, 8 GB Ram

Two ESXI3 network adapters

a 70-gig system disk.

Every virtual host is on its own server.

Cluster disks are configured as follow:

Data disk: 400gig under SCSI 1.1, physical, permanent independent

Arc Log: 420gig under SCSI 1.2, physical, permanent independent

Redo Log: 20gig under SCSI 1.3, physical, permanent independent

1 OCR Disk (Scsi 2.1), physical, permanent, independent

5 Voting Disks, (Scsi 2.2 to Scsi 2.8), physical, permanent, independent

All of these virtual disks have been created using vmkfstools, using eagerzeroedthick options.

Disk locking is false, and, according to best practices found online, we set up these extra parameters on vmx files:

diskLib.dataCacheMaxReadAheadSize = "0"

diskLib.dataCacheMinReadAheadSize = "0"

diskLib.dataCachePageSize = "4096"

diskLib.maxUnsyncedWrites = "0"

Data, Arc and Redo are used by Oracle ASM: all diskgroups have been configured using external redundancy. All virtual disks are placed on RAID-1 LUNS.

I enjoyed a flawless Oracle Stack installation; the same can be said for ASM and DB Instance creation, and we all know how tricky RAC installation can be.

When I imported data from the old RAC, data exported using plain old Exp, I suddenly experienced these errors, leading to failed import and sudden instance shutdown.

-


Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lgwr_3020.trc:

ORA-00345: redo log write error block 247036 count 2048

ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'

ORA-27070: async read/write failed

ORA-00345: redo log write error block 247036 count 2048

ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'

ORA-27070: async read/write failed

ORA-00345: redo log write error block 249084 count 1

ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'

ORA-27070: async read/write failed

ORA-00345: redo log write error block 249084 count 1

ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'

ORA-27070: async read/write failed

Tue Jun 01 18:59:08 2010

Doing block recovery for file 7 block 329099

Tue Jun 01 18:59:08 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lgwr_3020.trc:

ORA-00340: IO error processing online log 5 of thread 1

ORA-00345: redo log write error block 247036 count 2048

ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'

ORA-27070: async read/write failed

ORA-00345: redo log write error block 247036 count 2048

ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'

ORA-27070: async read/write failed

ORA-00345: redo log write error block 249084 count 1

ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'

ORA-27070: async read/write failed

ORA-00345: redo log write error block 249084 count 1

ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'

ORA-27070: async read/write failed

Tue Jun 01 18:59:08 2010

LGWR: terminating instance due to error 340

Tue Jun 01 18:59:08 2010

Doing block recovery for file 7 block 329099

Tue Jun 01 18:59:08 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lms1_2100.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:08 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lmd0_1468.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:08 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lms0_2540.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:08 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lmon_824.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:08 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_pmon_3332.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:09 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_ckpt_2400.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:09 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_j000_4060.trc:

ORA-00340: errore IO elaborando log in linea del thread

Tue Jun 01 18:59:09 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_psp0_1596.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:09 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_q000_3656.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:09 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lck0_3088.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:10 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_rbal_2092.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:10 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_mman_844.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:10 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_dbw0_644.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:11 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_o000_3292.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:17 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_reco_916.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:17 2010

Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_smon_672.trc:

ORA-00340: IO error processing online log of thread

Tue Jun 01 18:59:18 2010

Instance terminated by LGWR, pid = 3020

-


Moreover, I've spotted the same ORA-27070 error when importing a large table and creating the index on it.

-


IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 1115:

"CREATE INDEX "CRI10" ON "CRI_ORDINI_RIGHE" ("CLICTE_CRI" , "KEY_CRI" , "ARC"

"ART_VEN_CRI" , "ARCAAC_VEN_CRI" , "ARCARD_VEN_CRI" , "ARCARS_VEN_CRI" , "AR"

"CART_PRO_CRI" , "ARCAAC_PRO_CRI" , "ARCARD_PRO_CRI" , "ARCARS_PRO_CRI" , "D"

"T_CON_CHIESTA_CRI" , "NUMERO_RIGA_CRI" , "QT_ORDINATA_CRI" ) PCTFREE 10 IN"

"ITRANS 2 MAXTRANS 255 STORAGE(INITIAL 234881024 FREELISTS 1 FREELIST GROUPS"

" 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING"

IMP-00003: rilevato errore ORACLE 1115

ORA-01115: errore I/O leggendo blocco su file 7 (blocco n. 402013)

ORA-01110: file di dati 7: '+DATA/rucla/datafile/users.270.720554561'

ORA-27070: lettura/scrittura asincrona non riuscita

IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 20000:

"BEGIN DBMS_STATS.SET_INDEX_STATS(NULL,'"CRI10"',NULL,NULL,NULL,1831008,264"

"52,1831008,1,1,240554,2,6); END;"

IMP-00003: rilevato errore ORACLE 20000

ORA-20000: INDEX "RUBRA"."CRI10" does not exist or insufficient privileges

ORA-06512: a "SYS.DBMS_STATS", line 2124

ORA-06512: a "SYS.DBMS_STATS", line 5473

ORA-06512: a line 1

IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 1115:

"CREATE INDEX "CRI00" ON "CRI_ORDINI_RIGHE" ("KEY_CRI" , "ST_MODIFICA_CRI" )"

" PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 75497472 FREELISTS 1 F"

"REELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING"

IMP-00003: rilevato errore ORACLE 1115

ORA-01115: errore I/O leggendo blocco su file 7 (blocco n. 402845)

ORA-01110: file di dati 7: '+DATA/rucla/datafile/users.270.720554561'

ORA-27070: lettura/scrittura asincrona non riuscita

IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 20000:

"BEGIN DBMS_STATS.SET_INDEX_STATS(NULL,'"CRI00"',NULL,NULL,NULL,2010220,880"

"8,2010220,1,1,227401,2,6); END;"

IMP-00003: rilevato errore ORACLE 20000

ORA-20000: INDEX "RUBRA"."CRI00" does not exist or insufficient privileges

ORA-06512: a "SYS.DBMS_STATS", line 2124

ORA-06512: a "SYS.DBMS_STATS", line 5473

ORA-06512: a line 1

-


On physical machines, these errors are tell-tale signs of disk problems coming, either bad sectors or controller about to crash. But what about virtual machines? I googled a lot to get help about this.

The folks that set up ESX infrastructure, have been, so far, of no or little help. We' ve opened a call with hardware manufacturer, We're still waiting for the answer.

In short, we're stuck, since we followed all of the best practices we could lay our hand on.

Any help?

Thanks in advance,

Max Lambertini

0 Kudos
4 Replies
wila
Immortal
Immortal

Hi,

Welcome at the VMware communities forum.

Have you opened a support ticket with VMware?

They will probably be of better help as just the people in the forum here.

Is this a production server, or is it still in test?

I think it will be helpful if you attached the vmware.log files from your two windows virtual machines hosting the RAC cluster as that helps to see if there are visible problems at the virtualisation layer.



--
Wil
_____________________________________________________
VI-Toolkit & scripts wiki at http://www.vi-toolkit.com

Contributing author at blog www.planetvm.net

Twitter: @wilva

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
0 Kudos
maxlambertini
Contributor
Contributor

Hello,

This is a .vmx file from a similar cluster which, until recently, gave no sign of this behaviour . I said until recently because I received some reports about cluster failure. I also enclose a trace file of a similar error -- ORA-15080, which is a failed synchronous IO operation on a disk.

Thanks again for support,

Max

0 Kudos
wila
Immortal
Immortal

Hi Max,

The vmx file doesn't personally help me much, nor does that stack trace, maybe other people who are following this post?

I'd really need a vmware.log file in order to see what is happening on the virtual layer. If you don't want to attach it, you can always try to pm it to me or if a private message doesn't take attachments, then use the email address I describe in my profile.



--
Wil
_____________________________________________________
VI-Toolkit & scripts wiki at http://www.vi-toolkit.com

Contributing author at blog www.planetvm.net

Twitter: @wilva

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
0 Kudos
admin
Immortal
Immortal

Hi,

In my test installing RAC over VMware Server 2 (tested with Oracle 10gR1, 10gR2 and 11gR1), I used this vmx sample:

config.version = "8"

virtualHW.version = "4"

scsi0.present = "TRUE"

scsi0.virtualDev = "lsilogic"

memsize = "700"

scsi0:0.present = "TRUE"

scsi0:0.fileName = "localdisk.vmdk"

ide1:0.present = "TRUE"

ide1:0.fileName = "auto detect"

ide1:0.deviceType = "cdrom-raw"

floppy0.fileName = "A:"

Ethernet0.present = "TRUE"

displayName = "racnode1"

guestOS = "rhel4"

priority.grabbed = "normal"

priority.ungrabbed = "normal"

disk.locking = "FALSE"

diskLib.dataCacheMaxSize = "0"

scsi1.sharedBus = "virtual"

scsi1.present = "TRUE"

scsi1:0.present = "TRUE"

scsi1:0.fileName = "D:\vm\rac\sharedstorage\ocfs2disk.vmdk"

scsi1:0.mode = "independent-persistent"

scsi1:0.deviceType = "disk" scsi1:1.present = "TRUE"

scsi1:1.fileName = "D:\vm\rac\sharedstorage\asmdisk1.vmdk"

scsi1:1.mode = "independent-persistent"

scsi1:1.deviceType = "disk" scsi1:2.present = "TRUE"

scsi1:2.fileName = "D:\vm\rac\sharedstorage\asmdisk2.vmdk"

scsi1:2.mode = "independent-persistent"

scsi1:2.deviceType = "disk" scsi1:3.present = "TRUE"

scsi1:3.fileName = "D:\vm\rac\sharedstorage\asmdisk3.vmdk"

scsi1:3.mode = "independent-persistent"

scsi1:3.deviceType = "disk" scsi1.virtualDev = "lsilogic"

ide1:0.autodetect = "TRUE"

floppy0.present = "FALSE"

Ethernet1.present = "TRUE"

Ethernet1.connectionType = "hostonly"

Another questions:

- One best practice is have 2 OCR.

- RAID 1 doesn't perform wery well with I/O intensive apps like DBs. You're may have performance issues when you're making intensive writes. Better options can be RAID 0+1, RAID 6 or 6 DP, RAID 5... From better to worst. REDO and Controlfiles, at least, should be in the faster disk you can find. Archivelog and datafiles can be in RAID 5, of course, if datafiles aren't I/O intensive. Your trace file shows like final error "requested resource in use" in sync and async, maybe your RAID configuration is causing contention on writes when you import the data (write intensive ops, like creating indexes or taking statistics..).

Cheers,

Paulo.