Monitor hardware on ESXi with Python script

Monitor hardware on ESXi with Python script

I wrote a script in Python to monitor my free ESXi servers.

The script was written for Nagios-oriented monitoring but you can easily translate it for another monitoring tool by re-defining exit codes.

Successfully tested on :

- VMware ESXi 3.5 free, vSphere Hypervisor 4.1, vSphere 4.1 Standard, vSphere 4.1 Enterprise

- Dell PowerEdge servers

Script will work on almost configuration. However I didn't update the script in despite of receiving various comments and improvements (shame on me). That's why I recommend to take a look on this webpage : http://www.claudiokuenzler.com/ithowtos/nagios_check_esxi_hardware.php

It is definitively a merge of all variants spread out on the web, which were based on my original script

Attachments
Comments

Works great, thanks! I just set it up on Fedora using the python-pywbem package linked here: https://bugzilla.redhat.com/245688 (should be in Fedora soon). My next step will be building python-pywbem on RHEL5 and trying it there.

i love the script, i use it from windows with whatsup. I wrote a whatsup wrapper to use it.

http://www.stephenjc.com/2009/01/whatsup-vmware-esxi-monitor-these.html

your script works great from the command line, but i ran into problems when trying to define the check command for nagios. would you mind sharing how you got it to work?

edit: i keep getting a (null) return Smiley Sad

Thanks

Awesome work! Works great from a linux box. I did receive this error from a windows box:

C:\Python26\lib\site-packages\pywbem\cim_types.py:164: DeprecationWarning: object.__init__() takes no parameters

int.__init__(self, arg, base)

However the command appears to have completed successfully regardless.

It works great on Red Hat EL5 with the rpm above, or even on the ancient EL4 if you install the older pywbem-0.5 for python-2.3.

I also made a small improvement (full script available at http://staff.washington.edu/joshuadf/esxi/ ) to catch problems with EnumerateInstances. This catches AuthError for wrong password and should also work for the CIM_Memory problem described at http://communities.vmware.com/message/1069795 and http://communities.vmware.com/thread/163730

+ try:

instance_list = wbemclient.EnumerateInstances(classe)

+ except pywbem.cim_operations.CIMError,args:

+ verboseoutput("Unknown CIM Error: %s" % args, verbose)

+ except pywbem.cim_http.AuthError,arg:

+ verboseoutput("GLobal exit set to CRITICAL", verbose)

+ GlobalStatus = ExitCritical

+ ExitMsg += "CRITICAL : AuthError: %s
" % arg

+ else:

By the way at least on my Dell PowerEdge 2950s you can also get these:

'OMC_Fan',

'OMC_PowerSupply',

fix it!

Download the pywbem 0.7 (pywbem-0.7.0.tar.gz), open the file cim_types.py, copy the "# CIM integer types" section and replace it in your 0.6 "C:\Python26\Lib\site-packages\pywbem\cim_types.py"

The error doesn't come out anymore.

=========================================================================

Max

I'm having the same result via Nagios, (null), but from the command line I get OK, or in verbose mode see all of the checks.

Here's how I have the command defined in commands.cfg:

define command{

command_name check_esx_wbem

command_line $USER1$/check_esx_wbem.py https://$HOSTADDRESS:5989 $ARG2$ $ARG3$

}

And the check as defined for one of my ESXi servers

  1. username and password masked

define service{

use linux-critical-server-service

host_name esx01

service_description ESXi Hardware Monitor

check_command check_esx_wbem!readonlyuser!somepassword

}

I modified the script by adding the try catch block and now it works through nagios, strange in that I didn't change any of the nagios configurations.

Thanks to Joshua:

http://staff.washington.edu/joshuadf/esxi/check_esx_wbem.py

Wow, I just discovered this and all I can say is thank you.

A tip for users of distros without python-wbem (like Ubuntu):

Get it from http://sourceforge.net/project/showfiles.php?group_id=133883

and install it with `python setup.py install`

I just saw this script. I have a few newbie questions:

1. Where on my ESXi host do I store this script?

2. Can I setup a cron job so that it runs the script at a certain time and then send an email?

Thanks

After ESXi have been updated with this pachege from HP: hp-esxi4.0uX-bundle-1.1.zip (google the file name if you want to find it)extra classes has to been added to the script: http://www.intellipool.se/forum/lofiversion/index.php/t1548.html to alså check the new features. (Storage)

After this update ESXi is aware of HP Storage adapers and disks.

One of our servers now shows a warning in vSpehere Client regarding storage (maybe faulty battery or something), but shows OK using this script.

Any ideas anyone?

Hi,

running the current HP VMware ESXi 4.0.0 build-208167, and tried to to force a storage error by drawing one disk of a mirror, and pulling a plug from one power supply.

Unfortunately only the power plug is shown as an CRITICAL error, but not the drawn disk. It is noticed, but not flagged as CRITICAL:

(output excerpt of check_esx_wbem.py verbose)

20091222 15:16:19 Check classe VMware_StorageExtent

20091222 15:16:20 Element Name = Disk 1 on HPSA1 : Port 1I Box 1 Bay 1 : 419GB : Data Disk

20091222 15:16:20 Element Name = Disk 2 on HPSA1 : Port 1I Box 1 Bay 2 : 419GB : Data Disk

20091222 15:16:20 Element Name = Disk 3 on HPSA1 : Port 1I Box 1 Bay 3 : 0GB : Data Disk : Disk Error

20091222 15:16:20 Element Name = Disk 4 on HPSA1 : Port 1I Box 1 Bay 4 : 931GB : Data Disk

20091222 15:16:20 Element Name = Disk 5 on HPSA1 : Port 2I Box 1 Bay 5 : 931GB : Data Disk

20091222 15:16:20 Element Name = Disk 6 on HPSA1 : Port 2I Box 1 Bay 6 : 931GB : Data Disk

20091222 15:16:20 Check classe VMware_Controller

20091222 15:16:20 Element Name = HP Smart Array P410i Controller : HPSA1

20091222 15:16:20 Check classe VMware_StorageVolume

20091222 15:16:20 Element Name = Logical Volume 1 on HPSA1 : RAID 1 : 419GB : Disk 1,2

20091222 15:16:20 Element Name = Logical Volume 2 on HPSA1 : RAID 1 : 931GB : Disk 3,4 : Interim Recovery

20091222 15:16:20 Element Name = Logical Volume 3 on HPSA1 : RAID 1 : 931GB : Disk 5,6

CRITICAL : Power Supply 1 Power Supply 1: Failure detected
CRITICAL : Power Supply 1

Does somebody already have a solution for this?

Cheers,

-Matthias

It's that because HP agents reports disk failure in the label instead of in the classe status... Actually I own Dell servers which make me lazy for modifying the script... May be someone that owns HP servers can help to modify it

It`s that I have 4 other esxi servers running on HP that confuses me. It`s just on this server that the warning shows.

Branden modified the original script in order to monitor HP's servers.

Check it out at following URL : http://snednarb.wordpress.com/2010/02/02/get-hp-array-health-from-esxi-4-0/

also wanted to add I've modified Stephen's script from above for WhatsUpGold to put the details of the results in a variable and mail it to me.

I can't seem to add attachments here so I've posted to the whatsupgold forums. Enjoy.

http://ipswitch.hivelive.com/posts/5fe10a5fc3

Hello,

I've seen there were several modified versions. I decided to combine them and add a hardware type switch (currently either HP or DELL) because some responses given by a HP server could be taken as "OK" for a Dell server and the other way around.

You can download the check_esxi_wbem.py and view a small documentation here:

http://www.claudiokuenzler.com/ithowtos/nagios_check_esxi_wbem.php

Thanks! I downloaded it and works great on our Dell at least. On a Precision we have esxi 3.5 installed on shows an "unknown CIM Error" on CIM_Memory but that's just a warning.

On a Precision? That's a workstation, isn't it? I guess the SNMP/CIM tables look very different to the ones on the servers. Unfortunately I don't know if there are even documented CIM providers for workstations.

By the way, if you want me to change the link to your website, please give me a shout Smiley Happy.

Thank you guys for all improvements. I'm currently on vacation in the USA, I promise to refresh the script with all improvements once returned in Europe.

Hi, i have a free ESXi system and tried to install python-wbem but would get an error when i tried the "python setup.py install" saying there was no module distutils. If i try to install distutil, then i get a python code error which doesn't really mean much to me.

The free ESXi Dell version i have doesn't seem to allow me to do any rpm, yum commands to update python that way as well. Not too sure what else i can do.

Any help is appreciated. I also successfully installed a VIB for OpenManage which works but would just not detect anything as far as storage/drives went.

Cheers!

Hey there,

I think you may be a bit mixed up. Typically what we've done in this post is install python on a seperate (non ESX) machine. Original poster used a Nagios monitoring machine, I personally use whatsupGold. Then we tie the monitoring into the python script.

Hi ben13,

Thanks for the reply. I am probably confused as i thought that script could operate on the ESXi server itself I do have a Groundworks/Nagios monitoring machine but i guess I'm not sure how the python script sitting on a remote non-ESX machine could monitor the ESXi machine. Unless that's not the point of the script.

Cheers!

Hey again,

Yes this script runs on a separate box but DOES connect to your ESXi host CMI info.

Some have had to enable CMI (go to your ESXi host using VI, click on parent, click Configuration tab, select "advanced settings" under software, select UserVars and make sure UserVars.CIMEnabled = 1, UserVars.CIMOEMProvidersEnabled =1 and UserVars.CIMCustomProvidersEnabled =1.

Basically the info you get in the config/Health Status area can be retrieved from an external box . Once you have that setup you either have nagios/whatsupgold or something else send you an alert based on the readings it receives. hope that makes more sense.

Hi ben13,

Thanks for all the help so far! I see in my vSphere Client that i do indeed have info in my config/Health Status area. However, when i try from another server in the same subnet to run the check_esx_wbem.py (from my non esx server) as in ./check_esx_wbem.py remote-server username password, i get a socket error. If i sniff traffic, i see that my non-ESX linux box with the esx script is trying to connect to my ESXi box on a wbem-http port which is not listening.

Is there anything i forgot to install on my ESXi server besides the OpenManage VIB that i installed?

I do have Nagios as well and am ideally looking to get emails sent on drive failures and such.

Cheers!

Hi,

Be sure you have specified the port number in the command line :

./check_esx_wbem.py https://remote-server:5989 username password

Hello again,

couple of things. Do you have the PyWBEM module installed on the machine you are running the check from? Also can you try telnet (or netcat) into the ESXi box on port 5989 and see if it answers. Also did you specify the port as couak states.

I didn't have to set anything up in the ESX firewall as far as I remember.

Using https://host:5989 worked great. Was then able to use this in Nagios ,thanks a million!

glad its all sorted out now!

Whenever I try this, on RHEL/CentOS 5, I get:

Traceback (most recent call last):

File "./check_esx_wbem.py", line 75, in ?

instance_list = wbemclient.EnumerateInstances(classe)

File "/usr/lib/python2.4/site-packages/pywbem/cim_operations.py", line 403, in EnumerateInstances

ClassName = CIMClassName(ClassName),

File "/usr/lib/python2.4/site-packages/pywbem/cim_operations.py", line 178, in imethodcall

reply_dom = minidom.parseString(resp_xml)

File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/minidom.py", line 1924, in parseString

from xml.dom import expatbuilder

File "/usr/lib64/python2.4/site-packages/_xmlplus/dom/expatbuilder.py", line 32, in ?

from xml.parsers import expat

File "/usr/lib64/python2.4/site-packages/_xmlplus/parsers/expat.py", line 4, in ?

from pyexpat import *

ImportError: /usr/lib64/python2.4/site-packages/_xmlplus/parsers/pyexpat.so: undefined symbol: XML_SetSkippedEntityHandler

I've been using this script on Nagios with ESXi 4.0 for a while -  works great, but for the new 4.1 installations I get this error.

Any suggestions? 🙂

$ ./check_esx_wbem.py https://hostname username password
Traceback (most recent call last):
  File "./check_esx_wbem.py", line 75, in <module>
    instance_list = wbemclient.EnumerateInstances(classe)
  File "/usr/lib/python2.6/site-packages/pywbem/cim_operations.py", line 404, in EnumerateInstances
    **params)
  File "/usr/lib/python2.6/site-packages/pywbem/cim_operations.py", line 219, in imethodcall
    raise CIMError(code, tt[1]['DESCRIPTION'])
pywbem.cim_operations.CIMError: (6, u'The requested object could not be found')

Hi,

i have downloaded that version http://www.claudiokuenzler.com/ithowtos/nagios_check_esxi_wbem.php and moved it to /usr/local/nagios/libexec i run,

root@monitor:/usr/local/nagios/libexec# ./check_esxi_wbem.py https://10.1.51.33:5989 monitor check dell verbose

and get the following errors,


20110106 14:01:17 Connection to https://10.1.51.33:5989
20110106 14:01:17 Check classe OMC_SMASHFirmwareIdentity
20110106 14:01:17 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:17 Check classe CIM_Chassis
20110106 14:01:18 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:18 Check classe CIM_ComputerSystem
20110106 14:01:19 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:19 Check classe CIM_NumericSensor
20110106 14:01:19 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:19 Check classe CIM_Memory
20110106 14:01:20 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:20 Check classe CIM_Processor
20110106 14:01:20 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:20 Check classe CIM_RecordLog
20110106 14:01:21 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:21 Check classe OMC_DiscreteSensor
20110106 14:01:21 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:21 Check classe OMC_Fan
20110106 14:01:22 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:22 Check classe OMC_PowerSupply
20110106 14:01:22 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:22 Check classe VMware_StorageExtent
20110106 14:01:22 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:22 Check classe VMware_Controller
20110106 14:01:23 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:23 Check classe VMware_StorageVolume
20110106 14:01:23 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:23 Check classe VMware_Battery
20110106 14:01:24 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')
20110106 14:01:24 Check classe VMware_SASSATAPort
20110106 14:01:25 Unknown CIM Error: (0, 'Socket error: [Errno 111] Connection refused')

any idea guys? i am running this against IBM server X3550 system, is that the reason?

The correct format for Nagios commands.cfg is:

define command{

        command_name    check_esx_wbem
        command_line    $USER1$/check_esx_wbem.py https:\/\/$HOSTADDRESS$:5989 $ARG1$ $ARG2$
        }

The must have the \/\/

The correct format for calling the command is:

define service{
        use                             local-service
        host_name                       hostname
        service_description             ESXi Hardware Status
        check_command                   check_esx_wbem!root!password
        }

I'm having the following errors when running the script:

ciboodle@vmserver:/usr/local/nagios/libexec$ sudo ./check_esx_wbem.py http://10.77.83.50:5989 root ********* verbose
20110215 10:20:06 Connection to http://10.77.83.50:5989
20110215 10:20:06 Check classe CIM_ComputerSystem
Traceback (most recent call last):
  File "./check_esx_wbem.py", line 75, in <module>
    instance_list = wbemclient.EnumerateInstances(classe)
  File "/usr/local/lib/python2.6/dist-packages/pywbem/cim_operations.py", line 404, in EnumerateInstances
    **params)
  File "/usr/local/lib/python2.6/dist-packages/pywbem/cim_operations.py", line 173, in imethodcall
    raise CIMError(0, str(arg))
pywbem.cim_operations.CIMError: (0, "The web server returned a bad status line: ''")

The last line is the most interesting to me, with the server returning only '' to me. I have checked my ESXi server and CIM is enabled. I've tried monitoring the packets sent between the 2 servers, but so far I've had no luck.  Can anyone help me out?

Thanks! It started working for the first few tests, but I got an error after a few:

ciboodle@vmserver:/usr/local/nagios/libexec$ sudo ./check_esx_wbem.py https://10.77.83.50:5989 root ********* verbose                                        20110215 10:34:29 Connection to https://10.77.83.50:5989
20110215 10:34:29 Check classe CIM_ComputerSystem
20110215 10:34:30 Element Name = demo-esx.**********.****.**
20110215 10:34:30 Element Name = Controller 0 (SAS6IR)
20110215 10:34:30 Element Op Status = 2
20110215 10:34:30 Check classe CIM_NumericSensor
20110215 10:34:30 Check classe CIM_Memory
Traceback (most recent call last):
  File "./check_esx_wbem.py", line 75, in <module>
    instance_list = wbemclient.EnumerateInstances(classe)
  File "/usr/local/lib/python2.6/dist-packages/pywbem/cim_operations.py", line 404, in EnumerateInstances
    **params)
  File "/usr/local/lib/python2.6/dist-packages/pywbem/cim_operations.py", line 219, in imethodcall
    raise CIMError(code, tt[1]['DESCRIPTION'])
pywbem.cim_operations.CIMError: (6, u'The requested object could not be found')

Should I comment out the test and see if the rest pass?

yes try to comment test that doesn't work

what kind of hardware do you have ?

The rest of the tests work, I only had to comment out CIM_Memory.

I'm running on a Dell Precision Workstation T7400 with an Intel Xenon E5420 8x2.50GHz processor and a 260GB HDD and about 4Gig RAM

Dear All,

Does this work on ESXi 5 and 5.1?

Thanks.

Hi couak

I tried using this script for my vmware esxi 5.1

I gave this command

[root@mum-test-vm ~]# ./check_esx_wbem.py <IPADDRESS>:5989 <user-root> <passwd>

Traceback (most recent call last):

  File "./check_esx_wbem.py", line 75, in <module>

    instance_list = wbemclient.EnumerateInstances(classe)

  File "/usr/lib/python2.6/site-packages/pywbem/cim_operations.py", line 404, in EnumerateInstances

    **params)

  File "/usr/lib/python2.6/site-packages/pywbem/cim_operations.py", line 173, in imethodcall

    raise CIMError(0, str(arg))

pywbem.cim_operations.CIMError: (0, 'Invalid URL')

I am an amateur in Python. Please help!!

Thanks.

Though I  sucessfully executed it  affter using https://<IPADRESSS>

The output shows as OK

what does this mean &  where do I find my logs??

Version history
Revision #:
1 of 1
Last update:
‎08-20-2008 07:53 AM
Updated by: