HP just told us yesterday, that if you answer yes to the trap/management enablement in the script interactvie process, or you set the exports to yes in the hpmgmt confuration file for silent install of 8.0.0 agents, the issue occurs. HP has state that trap enablement is not supported, which is odd, since the documentation says nothing about unsupported options like this. Also, HP stated that the only valid way to set SNMP trap sinks is to edit the snmpd.conf file, this is a new twist for us, since we do that already, but never have heard that the hpmgmt configuration files has unsupported exports? I think HP is confused, and we have asked for explanation, which, HP responded with... we are escalating your case to next level... not a good response, from HP, which implies that HP is confused, again.
The solution is to restart the hpsmhd and snmpd services. This will also fix the connection timeout errors if you are getting any. Let me know how it goes. Cheers!
HP now swears that our hpmgmt.conf file is corrupt, and that the file we have used for 7.2.x through 7.9.x is not functional for 8.0.0. We are trying a new hpmgmt.conf file. But we do not believe seriously that this the answer. But trying it non-the-less.
Stuarty -- adding this to your snmpd.conf file and then restart the mgmt agents:
dlmod cmaX /usr/lib/libcmaX.so
Similar problem. Solution very simple
For edit hpmgmt.conf.example use only linux editor (like nano), not wordpad or other
HP swears our hpmgmt.conf file was corrupt but they used our file with no issues. It also appears that if the community strings are odd, in other words our side of the range A-Z and a-z, odd things can happen per HP. I did validate that our hpmgmt.conf was a proper UNIX/Linux file. So I still think an odd bug is in the mix here, we just have not found the right scenario to consistently generate it. We never saw this before 7.9.0 as I noted before.
Have HP resolved this issue yet, 'cause this isn't good at all and it needs to be working. After all VMware are pursuing the ESX server in production rhetoric and if the HW vendors cannot get the monitoring side stable then what's the point?
VCP,MCSE NT4/W2k/W2k3, MCSA W2k3
Interestingly I have no issues with the HP Management agents. Can you give the exact error you are seeing? Are you still getting a blank HPSMH screen?
Edward L. Haletky
VMware Communities User Moderator
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354, As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization
We have narrowed the issue a bit, new install 8.0.0a does not exhibit the issue, or at least we have yet to see it on 8.0.0a, but with only 10 or 20 installs, that is not a good sample size, IMHO. Update from 7.9.0 or 7.9.1, or 8.0.0 to any version upto 8.0.0a exhibits issue inconsistently. HP has published information to us on a similar issue that RHEL4 (or 5?) has, where unexpectedly a cma* process dies. No warning or notication of this process failure is done. Doing ps -auxwww | grep cma* | grep 'D', looking for daemons that have died have a status of D, or dead. However, this issue on RHEL4/5 we have not seen on ESX as yet, but it sounds similar. We are having trouble reproducing the issue. Both HP and us, are trying to consistently reproduce the issue, which is ESX OS independent, having occured on 3.0.2, 3.0.2 Update1, 3.5 and 3.5 Update1. The result of the issue, or simple timout warning from SMH page, and as noted in this thread, the way to clear the issue is to stop the HP agents, stop SNMP, and restart SNMP, and restart the HP agents. Sometimes this has to be done twice.
A couple of points - the OpenIPMI issue that was mentioned earlier was a bug in the 7.xx agents where they didn't detect ESX as an operating system properly if IPMI had to be recompiled. You can fix it by using the 8.xx agents, or this is what I was doing to get around it:
sed -i".original" 's/cat \/etc\/issue | grep -e \"VMware ESX Server 3\"/cat \/proc\/vmware\/version | grep -e \"VMware ESX Server 3\"/g' /opt/hp/hp-OpenIPMI/check_install_kernel.sh
chmod +x /opt/hp/hp-OpenIPMI/check_install_kernel.sh
(Sorry about the word-wrapping there...)
I actually ended up removing the hp-OpenIPMI rpm from my blades as it was creating an IRQ conflict by using up yet another IRQ in the limited space which was causing performance issues - didn't seem to cause any issues, although we weren't using the agents heavily.
If you're having trouble with the SMH timing out with messages like:
"A timeout occurred while loading data for the HP System Management Homepage which may result in missing or incomplete information. See the HP System Management Homepage log for additional information."
Check your /etc/snmpd/snmpd.conf - the SMH will need access through SNMP to ennumerate all the devices properly and it does this with the default community strings. With versions 8.0.0a and 8.1.0 I had difficulty with this and I narrowed it down to me supplying a string other than "public" for the"export CMALOCALHOSTROCOMMSTR" line in the hpmgmt.conf silent install file, which then translates to the "rocommunity" line in snmpd.conf.
For example, if my snmpd.conf had:
rocommunity NOTpublic 127.0.0.1
Then the SMH would timeout - but if I changed this back to what SMH was expecting, it worked fine:
rocommunity public 127.0.0.1
I assume it also expects you to specify the RW string as "private":
rwcommunity private 127.0.0.1
After changing the snmpd.conf, run these commands to restart the SMH:
service snmpd restart
service hpsmhd restart
I'm not sure how much of a security issue this is with HP forcing you to use default well-known SNMP strings (after all it is only from the localhost)... But if this is an issue for you then just uninstall (rpm -e) the SMH rpm after the silent install has finished.
I hope that helps - it took me ages to narrow that timeout issue down changing the snmpd.conf line by line at a time!
I've installed version 8.1.0 here and used a non-standard RO community name and it works as expected.
That name needs to be in the list of global community names in HPSIM.
The 8.1.0 install also opens port 2301 in the firewall which also needs to be open for identification purposes.
And with the QLogic libraries installed the fibre connections show up nicely as well.
We don't use HPSIM... I was merely trying to browse to the server at "https://hostname:2381" and it was timing out if I didn't ahve the "rocommunity public 127.0.0.1"...
It would timeout even if I set it to "esxcfg-firewall --allowincoming --allowoutgoing, the only way around it was to put the public string back in for the localhost. Do you know where the homepage would have that public string set? Can I change it to a non-default value?
The agents do need a localhost (127.0.0.1) RO community name. But it doesn't need to be "public". I also have a RW community name for localhost. I think that's needed to be able to set some things.
I don't do anything on the agents themselves in order to be able to use something other than public. Not sure that they care.
My trapcommunity name is the same as the RO community name. Not sure if that helps.
This works the same way on my windows servers, I have an RO and RW community name for localhost. And to use HPSIM, an RO community that allows the HPSIM Server.
My snmpd.conf file looks something like (names changed to protect the innocent):
dlmod cmaX /usr/lib/libcmaX.so
rwcommunity MYRWString 127.0.0.1
rocommunity MYROString 127.0.0.1
rocommunity MYROString hpsim.ourco.nz
trapsink hpsim.ourco.nz MYROString
VMware MIB modules. To enable/disable VMware MIB items
add/remove the following entries.
dlmod SNMPESX /usr/lib/vmware/snmp/libSNMPESX.so
http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=US&swItem=MTX-62350b0bf98a442fb3c1cb1154 -> version 8.1.0 of "HP Management Agents for VMware ESX Server 3.x" out on 25 July 2008
Unfortunately, the 8.1 agents are not wonderful either...
IML can have phantom NIC events, ping pong of on- then off-line reporting, when NIC is never offline, this issue has been around for a long time, most offen see it on HP DL580 G5
System Management Home Page stil suffers from timeout at random - most often shows up on uninstall and install from older version, clean install does not seem to happen as often
CMANICD subagent can drive COS/PCPU0 to 100% load - new issue, first seen in 8.0.
CMAFCAD subagent can drive COS/PCPU0 to 100% load - long term issue since 7.1.2 agents
Most of these issues appeared in 7.9.x and continued into 8.0, 8.0.0a, and now 8.1. HP is really struggling to resolve these. Sad, really, because HP usually did very well on ESX OS with the management agents.