David_hyperic
Contributor
Contributor

Oracle DBs: Monitoring the alert.log

We're currently testing Hyperic HQ as monitoring-tool for our oracle databases. We want it to scan the alter.log files for ORA- errors periodically and, if a new one is found, report the error through mail.

The current method we're using is this:

We set up a New Platform Service for the Server hosting the DB. Resource Type is Process. We set it up to querry the oracle smon process (State.Name.eq=oracle,Args.*.eq=ora_smon_INSTANCE) and enabled the service.log tracking.

service.log_track.include ORA-
service.log_track.files /ora/app/oracle/admin/INSTANCE/bdump/alert_INSTANCE.log

Then we set up an alarm.

If Condition: Event/Log Level(ANY) and matching substring "ORA-"
Enable Action(s): Each time conditions are met.

When we enable this monitor and insert a ORA- error into the .log file, Hyperic HQ sends us a Mail reporting an ORA- error happened and lists the -first- ORA- error that appears in the log. If we insert more errors Hyperic only reports the -total number- of errors that appear in the log, but no specifics, just a number.

- Triggering Condition(s):
If Event/Log Level(ANY) and matching substring "ORA-"
Log: /ora/app/oracle/admin/INSTANCE/bdump/alert_INSTANCE.log: Message 'ORA-' repeated 21 times

So, our question basicly boils down to: Is there a better/right way to do what we want to do?
0 Kudos
2 Replies
JohnMarkOrg
Hot Shot
Hot Shot

Are you sure that the raw log file doesn't report this the same way? Some mail logs I've viewed will actually say "message repeated n times". Can you confirm what is listed in the raw logs?

-John Mark
0 Kudos
David_hyperic
Contributor
Contributor

Yes, the log reports every ORA- Error seperatly. Here are some excerpts:

Thu Feb 7 09:44:40 2008
Errors in file /ora/app/oracle/admin/INSTANCE/udump/instance_ora_19929.trc:
ORA-00600: internal error code, arguments: [17182], [0x0B6F41C08], [], [], [], [], [], []
Tue Mar 18 22:19:51 2008
Errors in file /ora/app/oracle/admin/INSTANCE/udump/instance_ora_21413.trc:
ORA-27037: unable to obtain file status
Tue Mar 18 22:28:29 2008
Thread 1 advanced to log sequence 10531
Current log# 2 seq# 10531 mem# 0: /u01/oradata/INSTANCE/redo02m1.dbf
Current log# 2 seq# 10531 mem# 1: /u01/oradata2/INSTANCE/redo02m2.dbf
Wed Mar 19 07:03:05 2008
Errors in file /ora/app/oracle/admin/INSTANCE/bdump/instance_arc1_9854.trc:
ORA-16038: log 3 sequence# 10532 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 3 thread 1: '/u01/oradata/INSTANCE/redo03m1.dbf'
ORA-00312: online log 3 thread 1: '/u01/oradata2/INSTANCE/redo03m2.dbf'



So for every "event" it lists the date, some additional info i.e. where the error occured and the errorcode(s) itself (if it's an error that got loged), each in a seperate line. There's no summary like the one we get sent by Hyperic HQ anywhere in the logfile.

Message was edited by: David
0 Kudos