VMware Cloud Community
jasonchaffee_hy
Contributor
Contributor

Oracle Auto Discovery is not working

I have my oracle env variables configured for the hyperic user and have the $ORACLE_HOME/bin in $PATH for the hyperic user. However, it is not discovering the Oracle database.

Does anyone have any idea what I need to do?
Reply
0 Kudos
12 Replies
jasonchaffee_hy
Contributor
Contributor

I have two different linux boxes with oracle running on them. I finally got auto-discovery to work on one of them, but not the other. I have no idea why...and I have no idea how to debug this.
Reply
0 Kudos
cwitt_hyperic
Hot Shot
Hot Shot

Might be worth examining any differences you can identify between the two boxes. Big things are the obvious ones to enumerate first: Linux distro/kernel version, version of Oracle, JDK/JRE version being used to run the HQ Agent, etc. Subtle configurations may also be worth noting if nothing obvious jumps out: User context for the HQ Agent, selinux configuration, etc.

Also, check the log/agent.log file. The agent failing to auto-discover Oracle may contain interesting data. If it does not now, it may be worth setting agent.logLevel to DEBUG in conf/agent.properties, and restarting the agent to see if more interesting data shows up then.
Reply
0 Kudos
jasonchaffee_hy
Contributor
Contributor

So far, I have not been able to identify any differences. That doesn't mean there aren't any, I just can't find them as of yet. The boxes are the same, OS is the same, Oracle distro is the same, the env variables are the same, the HQ Agent is installed from the same archive and thus have the same JRE. I checked the oracle processes running on both boxes and they appear to be the same as well.

I changed the log to debug and it prints out that it found ORACLE_HOME, but for whatever reason it cannot discover the Oracle server that is running. That is about the only useful information in the log output as far as I can tell.

This is very disconcerting...if I were an OPS guy and this happened, I would never use this product because I would not trust it to accurately detect things. I happen to be on the dev side and I am researching Hyperic HQ as a possible recommendation for the OPS team. Untill I can figure this out, Hyperic HQ has vote of no-confidence with me. 😞
Reply
0 Kudos
cwitt_hyperic
Hot Shot
Hot Shot

Check perms on the directories declared in ORACLE_HOME. I had a flashback to seeing this in one of my own deployments at one point when you mentioned ops. Turned out to be an issue of file system level permissions restricting the access of the user running the HQ Agent. Worth a peek, anyway.

The other alternative to checking the perms directory-by-directory (a bad one from a best practices perspective, but one worth at least mentioning) is to run the HQ Agent as the root user. Even if just for a quick run to validate what your seeing is a perm problem w/o touching mods on any of the dirs. Note: You'll likely have to reset ownership to your preferred agent user on some of the HQ Agent files when you're done running as root.
Reply
0 Kudos
jasonchaffee_hy
Contributor
Contributor

Bingo!

I started the agent as root and it discovered it. I will compare the permissions on the two boxes and see what is the difference now.
Reply
0 Kudos
cwitt_hyperic
Hot Shot
Hot Shot

Something else worth trying is invoking the Oracle plugin from the command-line to see if we get any interesting errors.

Invoking from within the HQ Agent directory, would look something like this:

jre/bin/java -jar bundles/agent-4.2.0-EE-1060/pdk/lib/hq-product.jar -p oracle -m discover

The path to your hq-product.jar file would have different versioning on the agent directory.

Output looks something like this upon successful discovery:

1 servers detected

Server: pdb04.intranet.hyperic.net Oracle 11g [/sw/oracle/app/oracle/product/11.
1.0]
AIID...../sw/oracle/app/oracle/product/11.1.0
config...
product..{process.ptql=State.Name.eq=oracle,Args.0.sw=ora_, jdbcUrl=jdbc:oracle
:thin:@localhost:1521:ORCLTST, tnsnames=network/admin/tnsnames.ora}
metric...null
control..null
cprops...{version=11g}
Reply
0 Kudos
cwitt_hyperic
Hot Shot
Hot Shot

Ahh! Great. Then you can ignore my last...or file it away for later use in debugging.
Reply
0 Kudos
jasonchaffee_hy
Contributor
Contributor

Thanks!!!
Reply
0 Kudos
cwitt_hyperic
Hot Shot
Hot Shot

As a sidenote, as a former ops guy myself, what you encountered is one of the many reasons I like HQ: it can do only what it has permissions to do. This translates to monitoring log files, initiating Control Actions, etc. as well. In tightly controlled environments, this is reassuring.

Cheers.
Reply
0 Kudos
jasonchaffee_hy
Contributor
Contributor

I agree with you on that, provided it is clear what to do and what not to do.

Spending several days trying to figure out how it can discover something isn't time well spent when you have limited resources.

Even now that I know it is some sort of permissions problem, I cannot track down the exact issue. I have changed about 100 permissions so far and still no luck. Also, I tried putting the hyperic user in the oracle group and that did not work either. It seems the only way I can get it to work is to run it as root, which is unacceptable.

As far as I can tell the hyperic user has the same access on both machines. I am sure there is something different, but it is like finding a needle in a haystack. I know my OPS team would not appreciate this at all, they would rather take a product with less functionality than spend all of this time on debugging why Hyperic isn't working as expected on this one box. If there were clear concise steps that would work every time, than I would have no problem with it..but I followed what little documentation there is on Oracle monitoring and I am still not having any luck.
Reply
0 Kudos
jasonchaffee_hy
Contributor
Contributor

I finally did a full recursive change of permissions and it is working now with the hyperic user, but his really concerns me because I don't like doing a recursive change like this...at least it is just a test box though.

Hopefully, it won't be so difficult in our production environment.
Reply
0 Kudos
jasonchaffee_hy
Contributor
Contributor

BTW, I am also willing to accept that I did something stupid in all of this that caused me more trouble than it should have.

The true test will be a clean install of the oracle server and hyperic, giving permissions for the TNS Listener Ping Service appropriately and seeing if it works.

My test environment might have be too "messed" with. 🙂
Reply
0 Kudos