4 Replies Latest reply on Jun 4, 2010 5:39 AM by PauloG

    Widespread ORA-27070 error on Oracle RAC 10.2.0.4.0 / Win2003 X64 on VSPhere

    maxlambertini Lurker

      Hello, all,

       

      We are experiencing some serious problems on a virtualized Oracle RAC 10.2.0.4.0, standard edition,  built on two Windows 2003 virtual machine hosted on a VSphere-running 3-server physical cluster.

       

      Win2003 machine are configured this way:

      OS: WinX64 Workstation Release 2

      4 CPU, 8 GB Ram

      Two ESXI3 network adapters

      a 70-gig system disk.

       

      Every virtual host is on its own server.

       

      Cluster disks are configured as follow:

      Data disk: 400gig under SCSI 1.1, physical, permanent independent

      Arc Log: 420gig under SCSI 1.2, physical, permanent independent

      Redo Log: 20gig under SCSI 1.3, physical, permanent independent

       

      1 OCR Disk (Scsi 2.1), physical, permanent, independent

      5 Voting Disks, (Scsi 2.2 to Scsi 2.8), physical, permanent, independent

       

      All of these virtual disks have been created using vmkfstools, using eagerzeroedthick options.  

       

      Disk locking is false, and, according to best practices found online, we set up these extra parameters on vmx files:

       

      diskLib.dataCacheMaxReadAheadSize = "0"

      diskLib.dataCacheMinReadAheadSize = "0"

      diskLib.dataCachePageSize = "4096"

      diskLib.maxUnsyncedWrites = "0"

       

      Data, Arc and Redo are used by Oracle ASM: all diskgroups have been configured using external redundancy. All virtual disks are placed on RAID-1 LUNS.

       

      I enjoyed a flawless Oracle Stack installation; the same can be said for ASM and DB Instance creation, and we all know how tricky RAC installation can be.

       

      When I imported data from the old RAC, data exported using plain old Exp, I suddenly experienced these errors, leading to failed import and sudden instance shutdown.

       

       

      -


       

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lgwr_3020.trc:

      ORA-00345: redo log write error block 247036 count 2048

      ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'

      ORA-27070: async read/write failed

      ORA-00345: redo log write error block 247036 count 2048

      ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'

      ORA-27070: async read/write failed

      ORA-00345: redo log write error block 249084 count 1

      ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'

      ORA-27070: async read/write failed

      ORA-00345: redo log write error block 249084 count 1

      ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'

      ORA-27070: async read/write failed

       

      Tue Jun 01 18:59:08 2010

      Doing block recovery for file 7 block 329099

      Tue Jun 01 18:59:08 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lgwr_3020.trc:

      ORA-00340: IO error processing online log 5 of thread 1

      ORA-00345: redo log write error block 247036 count 2048

      ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'

      ORA-27070: async read/write failed

      ORA-00345: redo log write error block 247036 count 2048

      ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'

      ORA-27070: async read/write failed

      ORA-00345: redo log write error block 249084 count 1

      ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.256.720556015'

      ORA-27070: async read/write failed

      ORA-00345: redo log write error block 249084 count 1

      ORA-00312: online log 5 thread 1: '+REDO/rucla/onlinelog/group_5.257.720556021'

      ORA-27070: async read/write failed

       

      Tue Jun 01 18:59:08 2010

      LGWR: terminating instance due to error 340

      Tue Jun 01 18:59:08 2010

      Doing block recovery for file 7 block 329099

      Tue Jun 01 18:59:08 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lms1_2100.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:08 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lmd0_1468.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:08 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lms0_2540.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:08 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lmon_824.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:08 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_pmon_3332.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:09 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_ckpt_2400.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:09 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_j000_4060.trc:

      ORA-00340: errore IO elaborando log in linea del thread

       

      Tue Jun 01 18:59:09 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_psp0_1596.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:09 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_q000_3656.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:09 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_lck0_3088.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:10 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_rbal_2092.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:10 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_mman_844.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:10 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_dbw0_644.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:11 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_o000_3292.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:17 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_reco_916.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:17 2010

      Errors in file c:\oracle\product\10.2.0\admin\rucla\bdump\rucla1_smon_672.trc:

      ORA-00340: IO error processing online log of thread

       

      Tue Jun 01 18:59:18 2010

      Instance terminated by LGWR, pid = 3020

      -


       

      Moreover, I've spotted the same ORA-27070 error when importing a large table and creating the index on it.

       

      -


      IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 1115:

      "CREATE INDEX "CRI10" ON "CRI_ORDINI_RIGHE" ("CLICTE_CRI" , "KEY_CRI" , "ARC"

      "ART_VEN_CRI" , "ARCAAC_VEN_CRI" , "ARCARD_VEN_CRI" , "ARCARS_VEN_CRI" , "AR"

      "CART_PRO_CRI" , "ARCAAC_PRO_CRI" , "ARCARD_PRO_CRI" , "ARCARS_PRO_CRI" , "D"

      "T_CON_CHIESTA_CRI" , "NUMERO_RIGA_CRI" , "QT_ORDINATA_CRI" )  PCTFREE 10 IN"

      "ITRANS 2 MAXTRANS 255 STORAGE(INITIAL 234881024 FREELISTS 1 FREELIST GROUPS"

      " 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING"

      IMP-00003: rilevato errore ORACLE 1115

      ORA-01115: errore I/O leggendo blocco su file 7 (blocco n. 402013)

      ORA-01110: file di dati 7: '+DATA/rucla/datafile/users.270.720554561'

      ORA-27070: lettura/scrittura asincrona non riuscita

      IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 20000:

      "BEGIN  DBMS_STATS.SET_INDEX_STATS(NULL,'"CRI10"',NULL,NULL,NULL,1831008,264"

      "52,1831008,1,1,240554,2,6); END;"

      IMP-00003: rilevato errore ORACLE 20000

      ORA-20000: INDEX "RUBRA"."CRI10" does not exist or insufficient privileges

      ORA-06512: a "SYS.DBMS_STATS", line 2124

      ORA-06512: a "SYS.DBMS_STATS", line 5473

      ORA-06512: a line 1

      IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 1115:

      "CREATE INDEX "CRI00" ON "CRI_ORDINI_RIGHE" ("KEY_CRI" , "ST_MODIFICA_CRI" )"

      "  PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 75497472 FREELISTS 1 F"

      "REELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING"

      IMP-00003: rilevato errore ORACLE 1115

      ORA-01115: errore I/O leggendo blocco su file 7 (blocco n. 402845)

      ORA-01110: file di dati 7: '+DATA/rucla/datafile/users.270.720554561'

      ORA-27070: lettura/scrittura asincrona non riuscita

      IMP-00017: l'istruzione seguente non è riuscita causa errore ORACLE 20000:

      "BEGIN  DBMS_STATS.SET_INDEX_STATS(NULL,'"CRI00"',NULL,NULL,NULL,2010220,880"

      "8,2010220,1,1,227401,2,6); END;"

      IMP-00003: rilevato errore ORACLE 20000

      ORA-20000: INDEX "RUBRA"."CRI00" does not exist or insufficient privileges

      ORA-06512: a "SYS.DBMS_STATS", line 2124

      ORA-06512: a "SYS.DBMS_STATS", line 5473

      ORA-06512: a line 1

      -


       

      On physical machines, these errors are tell-tale signs of disk problems coming, either bad sectors or controller about to crash. But what about virtual machines? I googled a lot to get help about this.

       

      The folks that set up ESX infrastructure, have been, so far, of no or little help. We' ve opened a call with hardware manufacturer, We're still waiting for the answer.

       

      In short, we're stuck, since we followed all of the best practices we could lay our hand on. 

       

      Any help?

       

      Thanks in advance,

      Max Lambertini

        • 1. Re: Widespread ORA-27070 error on Oracle RAC 10.2.0.4.0 / Win2003 X64 on VSPhere
          wila Guru
          vExpertUser ModeratorsCommunity Warriors

          Hi,

           

          Welcome at the VMware communities forum.

           

          Have you opened a support ticket with VMware?

          They will probably be of better help as just the people in the forum here.

           

          Is this a production server, or is it still in test?

           

          I think it will be helpful if you attached the vmware.log files from your two windows virtual machines hosting the RAC cluster as that helps to see if there are visible problems at the virtualisation layer.



          --
          Wil
          _____________________________________________________
          VI-Toolkit & scripts wiki at http://www.vi-toolkit.com

          Contributing author at blog www.planetvm.net

          Twitter: @wilva

          | Author of Vimalin. The virtual machine Backup app for VMware Desktop Products
          | Vimalin : Automated backups for VMware Fusion and VMware Workstation Professional
          | More info at https://www.vimalin.com
          | Twitter @wilva
          | VMware Wiki at http://www.vi-toolkit.com
          • 2. Re: Widespread ORA-27070 error on Oracle RAC 10.2.0.4.0 / Win2003 X64 on VSPhere
            maxlambertini Lurker

             

            Hello,

             

             

            This is a .vmx file from a similar cluster which, until recently, gave no sign of this behaviour . I said until recently because I received some reports about cluster failure. I  also  enclose a trace file of a similar error -- ORA-15080, which is a failed synchronous IO operation on a disk. 

             

             

            Thanks again for support,

             

             

            Max

             

             

            • 3. Re: Widespread ORA-27070 error on Oracle RAC 10.2.0.4.0 / Win2003 X64 on VSPhere
              wila Guru
              User ModeratorsCommunity WarriorsvExpert

              Hi Max,

               

              The vmx file doesn't personally help me much, nor does that stack trace, maybe other people who are following this post?

               

              I'd really need a vmware.log file in order to see what is happening on the virtual layer. If you don't want to attach it, you can always try to pm it to me or if a private message doesn't take attachments, then use the email address I describe in my profile.



              --
              Wil
              _____________________________________________________
              VI-Toolkit & scripts wiki at http://www.vi-toolkit.com

              Contributing author at blog www.planetvm.net

              Twitter: @wilva

              | Author of Vimalin. The virtual machine Backup app for VMware Desktop Products
              | Vimalin : Automated backups for VMware Fusion and VMware Workstation Professional
              | More info at https://www.vimalin.com
              | Twitter @wilva
              | VMware Wiki at http://www.vi-toolkit.com
              • 4. Re: Widespread ORA-27070 error on Oracle RAC 10.2.0.4.0 / Win2003 X64 on VSPhere
                Enthusiast

                Hi,

                 

                In my test installing RAC over VMware Server 2 (tested with Oracle 10gR1, 10gR2 and 11gR1), I used this vmx sample:

                 

                config.version = "8"

                virtualHW.version = "4"

                scsi0.present = "TRUE"

                scsi0.virtualDev = "lsilogic"

                memsize = "700"

                scsi0:0.present = "TRUE"

                scsi0:0.fileName = "localdisk.vmdk"

                ide1:0.present = "TRUE"

                ide1:0.fileName = "auto detect"

                ide1:0.deviceType = "cdrom-raw"

                floppy0.fileName = "A:"

                Ethernet0.present = "TRUE"

                displayName = "racnode1"

                guestOS = "rhel4"

                priority.grabbed = "normal"

                priority.ungrabbed = "normal"

                 

                disk.locking = "FALSE"

                diskLib.dataCacheMaxSize = "0"

                scsi1.sharedBus = "virtual"

                 

                scsi1.present = "TRUE"

                scsi1:0.present = "TRUE"

                scsi1:0.fileName = "D:\vm\rac\sharedstorage\ocfs2disk.vmdk"

                scsi1:0.mode = "independent-persistent"

                 

                scsi1:0.deviceType = "disk" scsi1:1.present = "TRUE"

                scsi1:1.fileName = "D:\vm\rac\sharedstorage\asmdisk1.vmdk"

                scsi1:1.mode = "independent-persistent"

                 

                scsi1:1.deviceType = "disk" scsi1:2.present = "TRUE"

                scsi1:2.fileName = "D:\vm\rac\sharedstorage\asmdisk2.vmdk"

                scsi1:2.mode = "independent-persistent"

                 

                scsi1:2.deviceType = "disk" scsi1:3.present = "TRUE"

                scsi1:3.fileName = "D:\vm\rac\sharedstorage\asmdisk3.vmdk"

                scsi1:3.mode = "independent-persistent"

                 

                scsi1:3.deviceType = "disk" scsi1.virtualDev = "lsilogic"

                ide1:0.autodetect = "TRUE"

                floppy0.present = "FALSE"

                Ethernet1.present = "TRUE"

                Ethernet1.connectionType = "hostonly"

                 

                Another questions:

                 

                - One best practice is have 2 OCR.

                - RAID 1 doesn't perform wery well with I/O intensive apps like DBs. You're may have performance issues when you're making intensive writes. Better options can be RAID 0+1, RAID 6 or 6 DP, RAID 5... From better to worst. REDO and Controlfiles, at least, should be in the faster disk you can find. Archivelog and datafiles can be in RAID 5, of course, if datafiles aren't I/O intensive. Your trace file shows like final error "requested resource in use" in sync and async, maybe your RAID configuration is causing contention on writes when you import the data (write intensive ops, like creating indexes or taking statistics..).

                 

                Cheers,

                Paulo.

                1 person found this helpful