VMware Cloud Community
JoeAtJefferies
Contributor
Contributor

Alerting woes - Who can do it better?

I've had a great deal of trouble with VMware alerting over the last year.  Many of my alarms need to be "reset" by going into the alarm, changing something small, clicking ok, then changing whatever it was back for the alarm to alert us probably.  This is an official workaround from VMware for Datastore Utilization which is something very important to us.  Also, host disconnects are a problem too.  When a host disconnects from vCenter for a moment (which happens once or twice a day since I have 70+ hosts), and comes back within seconds, may times we don't receive a notice that the machine has came back.  This is a huge problem whem myself or someone from my team is on call. We'll see that the host has disconnceted from vCenter (and could have gone down) but we don't receive the follow up alert since it has reconneded within only seconds.  The host connection state alert will NOT send up an update if it reconnects in less that one minute.  I believe it's a limitation of the alarm itself.  And often times even if a host has been disconnected longer you will not receive an email that it has reconneded, eventhough the alarm has been configured to do so.

So, my question is... Who can do it better?  I've been looking at Veeam Monitor but it seems a bit cheesy.  Not a lot of options, but it does alert when a host comes back online.  I wish there was a product that would hold off on sending out an alarm unless the condition has been such for at least (example) been offline for 1 minute or less.  Also, the ability to modify the subject line and body of the email to be more to the point would be a great help also.  What do the big guys (or medium guys) use besides vCenter native alerting?

I'm in despirate need of a reliable alerting product.  We don't have any other alerting product that we can just buy an add on for.  It's upto each department to have their own alerting system at my company.  Any suggestions would be greatly appreciated.

Thanks!

Joe

Reply
0 Kudos
3 Replies
AndreTheGiant
Immortal
Immortal

You can disable all alarms notification for one node or inventory object.

Veeam Monitor could be better... IMHO it could be more simple Smiley Happy

And can give also some alarms not provided by vCenter, like % of guest disk space.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
JoeAtJefferies
Contributor
Contributor

It appears Veeam can NOT do it better.  Theres actually a 20 second polling delay that keeps you slightly more in the dark in my opinion.  The entire lay out isn't very friendly.  Plus, you need to have their 3rd party console window open to see whats going on for those things less critical that you configure not to email.  I like monitoring datastore space availability via a folder structure in vcenter.  There is no concept of datastore folders with Veeam monitor.  Plus there's a database for every Veeam product out there it seems like.  Veeam monitor, reporter, and business view.  It took a long time to get all my departments on board to do a poc of their product. Actually, by the time all was configured my 30 day license had 5 days left.  They issued me a new one, but still...  By that time I had worked through all of my alerting problems!

With the host connection state not alerting when an ESX host comes back online.... That is fixed. I had "Repeat triggered alarm every 1 minutes."  I'm not sure why it wasn't sent to 0 but once it was it would email if the status changed even 1 second after.  Possibly it was a default setting, but I thought you couldn't set it to 0, until I tried.  Doh!  Now I'm receiving the "all clear" alerts I was missing.  Pretty obviously really.

The datastore usage alarms that were not working were addressed by VMware support.  I was told that I need to enable and disable each storage usage alarm to get them working again after my scheduled weekly vcenter reboot.  I was able to impliment a Power CLI PowerShell script that fixed this.  It's below:  (Thankfully you can use wildcards such as * if your alarm names follow a standard as mine do.)  As you can see, I'm resetting a snapshot alarm that seemed to be problematic.

disable_enable_alarm.ps1

# ensure PowerCLI is installed locally so that
# powershell properly interprets our commandlet
# place this file in auto-start script
# make sure this script starts last

# authenticate to vc
Add-PSSnapin VMware.VimAutomation.Core
Connect-VIServer vcenterserver.yourdomain.com -user vcenterusername -password "passwordhere"

# execute to get alarm-def and disable it;
Get-AlarmDefinition -Name "Datastore 97*" |set-alarmdefinition -Enabled:$false
Get-AlarmDefinition -Name "*Boot LUN Usage Monitored" |set-alarmdefinition -Enabled:$false
Get-AlarmDefinition -Name "*Datastore Usage and Connectivity Monitored" |set-alarmdefinition -Enabled:$false
Get-AlarmDefinition -Name "Running on Snapshot" |set-alarmdefinition -Enabled:$false
#wait for 20 sec until the task is complete
start-sleep -s 20

#re-enable alarm again
Get-AlarmDefinition -Name "Datastore 97*" |set-alarmdefinition -Enabled:$true
Get-AlarmDefinition -Name "*Boot LUN Usage Monitored" |set-alarmdefinition -Enabled:$true
Get-AlarmDefinition -Name "*Datastore Usage and Connectivity Monitored" |set-alarmdefinition -Enabled:$true
Get-AlarmDefinition -Name "*Running on Snapshot" |set-alarmdefinition -Enabled:$true

# disconnect from vc
Disconnect-VIServer -Confirm:$False

I have it kicking off from batch via a scheduled task 2 minutes after the vcenter service starts after reboot.

disable_enable_alarms.bat

C:\WINDOWS\system32\windowspowershell\v1.0\powershell.exe c:\scripts\disable_enable_alarm.ps1

Reply
0 Kudos
paugie
Contributor
Contributor

I came across this discussion searching out if there was a known bug with vCenter alerting specifically on datastores.  You mention that the alarms have to be disabled/reenabled, was this just for datastore based alarms or all of them?  We again had vCenter not inform us of a filling datastore last night so I'm looking to get this fixed somehow.

Did support indicate this as a known issue / fixed in vSphere 5 / just broken?

Reply
0 Kudos