We're currently testing Hyperic HQ as monitoring-tool for our oracle databases. We want it to scan the alter.log files for ORA- errors periodically and, if a new one is found, report the error through mail.
The current method we're using is this:
We set up a New Platform Service for the Server hosting the DB. Resource Type is Process. We set it up to querry the oracle smon process (State.Name.eq=oracle,Args.*.eq=ora_smon_INSTANCE) and enabled the service.log tracking.
If Condition: Event/Log Level(ANY) and matching substring "ORA-" Enable Action(s): Each time conditions are met.
When we enable this monitor and insert a ORA- error into the .log file, Hyperic HQ sends us a Mail reporting an ORA- error happened and lists the -first- ORA- error that appears in the log. If we insert more errors Hyperic only reports the -total number- of errors that appear in the log, but no specifics, just a number.
- Triggering Condition(s): If Event/Log Level(ANY) and matching substring "ORA-" Log: /ora/app/oracle/admin/INSTANCE/bdump/alert_INSTANCE.log: Message 'ORA-' repeated 21 times
So, our question basicly boils down to: Is there a better/right way to do what we want to do?
Yes, the log reports every ORA- Error seperatly. Here are some excerpts:
Thu Feb 7 09:44:40 2008 Errors in file /ora/app/oracle/admin/INSTANCE/udump/instance_ora_19929.trc: ORA-00600: internal error code, arguments: , [0x0B6F41C08], , , , , ,  Tue Mar 18 22:19:51 2008 Errors in file /ora/app/oracle/admin/INSTANCE/udump/instance_ora_21413.trc: ORA-27037: unable to obtain file status Tue Mar 18 22:28:29 2008 Thread 1 advanced to log sequence 10531 Current log# 2 seq# 10531 mem# 0: /u01/oradata/INSTANCE/redo02m1.dbf Current log# 2 seq# 10531 mem# 1: /u01/oradata2/INSTANCE/redo02m2.dbf Wed Mar 19 07:03:05 2008 Errors in file /ora/app/oracle/admin/INSTANCE/bdump/instance_arc1_9854.trc: ORA-16038: log 3 sequence# 10532 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 3 thread 1: '/u01/oradata/INSTANCE/redo03m1.dbf' ORA-00312: online log 3 thread 1: '/u01/oradata2/INSTANCE/redo03m2.dbf'
So for every "event" it lists the date, some additional info i.e. where the error occured and the errorcode(s) itself (if it's an error that got loged), each in a seperate line. There's no summary like the one we get sent by Hyperic HQ anywhere in the logfile.