markokobal
Enthusiast
Enthusiast

VDR 2.0 monitoring best practices and tools

Hi,

Simple question: What are VDR 2.0 monitoring best practices and tools? As there are no alarms in vCenter, I would like to integrate some check in Nagios to get informed when something is wrong with VDR backuping.

I've found these solutions:

http://www.jules.fm/Logbook/files/vdr-log-analyser.html
https://www.monitoringexchange.org/inventory/Check-Plugins/Software/Backup/VMWare-Data-Recovery
http://exchange.nagios.org/directory/Plugins/Backup-and-Recovery/VMware-Data-Recovery/details

however, neither of them is really straightforward. What are your best practices, which tools are you using to monitor VDR?

-- Kind regards, Marko. VCP5
0 Kudos
1 Reply
markokobal
Enthusiast
Enthusiast

Hi,

I've decided to slightly modify the script found here: https://www.monitoringexchange.org/inventory/Check-Plugins/Software/Backup/VMWare-Data-Recovery (the modified plugin is in the attachment). I've extracted the Phyton code out of the shell script and created standalone Phyton Nagios Plugin (you can find it in the attachment). The plugin parses the VDR logs and reports on errors. I did quite some testing on 4 different VDR appliances (some of them had Jobs with errors) and the reporting about the errors was accurate.

Here are short install instructions (you'll have to hack VDR appliance a little bit, but the hacks are independent from the appliance itself and they should survive possible future appliance updates and upgrades):

I.) VMware Data Recovery appliance configuration:

1. install basic Nagios NRPE service
rpm -ivh http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm
yum -y install nrpe
yum -y install nagios-common nagios-plugins

2. configre start at boot and start the service
chkconfig nrpe on
service nrpe start

3. open firewall for NRPE service
vi /etc/sysconfig/iptables

add this rule just after the "--dport 22" rule:
-A VDR-Firewall-1-INPUT -p tcp -m tcp --dport 5666 -j ACCEPT

service iptables restart

4. put check_vmware_data_recovery.py (plugin is in the attachment) into /usr/lib64/nagios/plugins and set permissions

chmod +x /usr/lib64/nagios/plugins/check_vmware_data_recovery.py

5. test the plugin
/usr/lib64/nagios/plugins/check_vmware_data_recovery.py "Integrity Check"

6. make sure nrpe user is able to read /var/log/messages

6.1: this is quite ugly, but easy:

# chmod 0744 /var/log/messages

or

6.2: allow nrpe to run as root:

# visudo

comment the line "Defaults requiretty" and add a line at the bottom:

nrpe ALL=(ALL) NOPASSWD: /usr/lib64/nagios/plugins/

7. add command to NRPE configuration

vi /etc/nagios/nrpe.cfg

command[check_vmware_data_recovery]=/usr/bin/sudo /usr/lib64/nagios/plugins/check_vmware_data_recovery.py "Integrity Check"

8. reload NRPE service

service nrpe reload

II.) Nagios server configuration:

1. define a new command on your Nagios server

define command{
    command_name    check_vmware_data_recovery
    command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_vmware_data_recovery
    }
   
2. add a check to your VMware Data Recovery appliance checks:

define service{
        use                             generic-service
        host_name                       vdr-01
        service_description             vmware data recovery
        check_command                   check_vmware_data_recovery
        }
       
3. manually test NRPE command

/usr/lib/nagios/plugins/check_nrpe -H [your VDR appliance IP] -c check_vmware_data_recovery

---

That's it ... if you have Nagios in you environment this should be an easy, straight-forward and good enough monitoring tool for your VMware Data Recovery Appliances health.

---

EDIT@2012-12-03: I've uploaded a new version of the plugin

-- Kind regards, Marko. VCP5
0 Kudos