We have a problem with a couple of ESXi hosts that don't respond on SNMP.
They are clean installed ESXi 5.0 build 768111
Hardware:
Dell R910 (no Dell VIB installed, doesnt work with or without).
Standard config of SNMP via VMware CLI:
VMware vSphere CLI>vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa -c public
Firewall is open. Request come in but there is a high number on the Recieve Queue when checking "netstat" for SNMP (port 161)
# esxcli network ip connection list
----- ------ ------ ------------------ ----------------- ----------- -------- ---------------
tcp 0 0 127.0.0.1:8307 127.0.0.1:52518 ESTABLISHED 20681 hostd-worker
tcp 0 0 127.0.0.1:52518 127.0.0.1:8307 ESTABLISHED 20882 hostd-worker
tcp 0 0 127.0.0.1:443 127.0.0.1:60453 ESTABLISHED 20175 hostd-worker
We have restarted management agents, reset snmp config, reinstalled the servers but a couple of them will not work and get his queue.
Any ideas?
Can you run and post the results of the --show command;
vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa --show
You should see something like this;
===============================
Current SNMP agent settings:
Enabled : 1
UDP port : 161
Communities :
public
anothercommunity
Notification targets :
mysnmptarget1.fqdn@162/public
mysnmptarget2.fqdn@162/anothercommunity
Options :
EnvEventSource=sensors
===============================
I have found that the --enable command creates another firewall rule called "dynamicbinding" even though SNMP is already created by deault on a clean insall of ESXi 5.0 (I would role this back and use PowerCLI to enable it).
vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa --disable
## Enable SNMP using PowerCLI (connect to host, not vCenter)
Get-VMHostSnmp | Set-VMHostSnmp -Enabled:$true
## Add Communities
Get-VMHostSnmp | Set-VMHostSnmp -ReadOnlyCommunity public,anothercommunity
## Add a Targets
Get-VMHostSnmp | Set-VMHostSnmp -TargetCommunity "public" -TargetHost "mysnmptarget1.fqdn" -TargetPort 162 -AddTarget
Get-VMHostSnmp | Set-VMHostSnmp -TargetCommunity "anothercommunity" -TargetHost "mysnmptarget2.fqdn" -TargetPort 162 -AddTarget
Additionally on the DELL hardware, I have found that in order for the traps to be translated correctly the "EnvEventSource" needs to be changed from indications to sensors.
This will change it;
vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa --hwsrc sensors
Then you can send a test trap;
vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa --test
Cheers,
Jon
Can you run and post the results of the --show command;
vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa --show
You should see something like this;
===============================
Current SNMP agent settings:
Enabled : 1
UDP port : 161
Communities :
public
anothercommunity
Notification targets :
mysnmptarget1.fqdn@162/public
mysnmptarget2.fqdn@162/anothercommunity
Options :
EnvEventSource=sensors
===============================
I have found that the --enable command creates another firewall rule called "dynamicbinding" even though SNMP is already created by deault on a clean insall of ESXi 5.0 (I would role this back and use PowerCLI to enable it).
vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa --disable
## Enable SNMP using PowerCLI (connect to host, not vCenter)
Get-VMHostSnmp | Set-VMHostSnmp -Enabled:$true
## Add Communities
Get-VMHostSnmp | Set-VMHostSnmp -ReadOnlyCommunity public,anothercommunity
## Add a Targets
Get-VMHostSnmp | Set-VMHostSnmp -TargetCommunity "public" -TargetHost "mysnmptarget1.fqdn" -TargetPort 162 -AddTarget
Get-VMHostSnmp | Set-VMHostSnmp -TargetCommunity "anothercommunity" -TargetHost "mysnmptarget2.fqdn" -TargetPort 162 -AddTarget
Additionally on the DELL hardware, I have found that in order for the traps to be translated correctly the "EnvEventSource" needs to be changed from indications to sensors.
This will change it;
vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa --hwsrc sensors
Then you can send a test trap;
vicfg-snmp.pl --server esx01 --username root --password aaaaaaaaaaaa --test
Cheers,
Jon
Thanx for the tips. It seems --hwsrc sensors did the trick. After changing to that the queuing went away and I can snmpwalk and get a response. For some reason on two of our ten Dell R910 we need to use "sensors". On the others "indications" work by default.
I'll test to install the Dell OMSA Agent VIB now ( OM-SrvAdmin-Dell-Web-7.1.0-5304.VIB-ESX50i_A00.zip ) and see if it holds up.
We use SNMP to check the network cards uplink status and then a Nagios script ( http://www.claudiokuenzler.com/nagios-plugins/check_esxi_hardware.php ) to get hardware status.
Good news. I understand that changing it to from "indications" to "sensors" uses IPMI instead of CIM, so perhaps there is a subtle configuration difference in your iDRAC's? If you have the time, it would probably be worth exporting the two configurations to XML for comparison?
After the Dell OMSA VIB install the problem came back. Both for sensors and indications. Will test more tomorrow.
Are the results of the --show command what you would expect? Are you using the Dell VC plugin? I found that the Dell VC plugin replaced all my SNMP configuration with a default config sending traps to the plugin instead of OME ... After reconfiguring SNMP on each host it worked again.
No vCenter plugin from Dell.
This is the show command:
Reinstalled one of the servers again. It seems it's already when we add the network drivers for our Emulex 10Gbit NICs that SNMP start acting up. This is before we install any Dell VIB. Will test some more.