This "host storage status" alarm is not triggering.
One of my servers (ESXi 4.1, HP DL360 G5 w/ HP management agents components loaded) recently had a drive failure, under the the "hardware status" tab I can see the warning for the Storage Sensor but the "host storage status" alarm is not triggered even though is enabled and the events to trigger alarms are configured correctly.
when you use Advanced options within the triggers, if they are not configured 100% properly, they will not work. Has this alarm every worked? Have you confirmed with VMware that the advanced options are correct?
This is the first time I have a hardware related issue so I can't tell if they worked before.
I do know that these hardware status related alarms are supposed to be ready as they come pre-configured during the vCenter install . The advancded options seemed to be fine as well (status and conditions for each event) I have setup actions for all states too (send email) with no success. Reference here. look for Alarms with health status trigger conditions are not migrated to vSphere 4.0
I have been trying configuring a test alarm with different options even just the the advaced condition for "hardware health changed" to match any yellow state, but it did not work either.
Any other thoughts?
like I said, the alarm in it's default state, should work. However, when you start adding advanced triggers, you could break the alarm all together. My suggestion would be to talk to VMware Support. Because out of the box, you will not be able to trigger forspecific components such as HDD failure.
You can however just trigger for "Hardware Health Changed". But after you get the alert, you would have to look into the Health Status Tab to see if a specific component is degraded/failed.
Thanks for the response.
Just to clarify, the only thing I did to the default alarms was adding the action to send email for all states, nothing else. (I know as best practice to not mess up with defaults/pre-configurations, that's why I created a new dummy alarm to play with)
That's exactly what I am referring to, to be notified just for the Storage status change which is not working. I am not expecting a detailed alarm to let me know about what drive failed.
With the dummy alert I created I was expecting to at least see an alert triggering just because there's a warning in the hardware status of the server -regardless the component is failed right now- but this is not working either.
I was first exploring the communities to check whether someone else out there is having this behavior before going to VMWare support.
Has there been a fix for this? I'm also having an issue with this.
I used the default alarm for host power, and added my email address for all status changes.
I also tried creating a new alert for Storage following the pattern of the power alert.
I can't get any of those to send an email.
I also tried a new alarm for Status "unset" with advanced settings group equal Storage newValue not equal green. That did not work either.
The support call I logged is still being looked at by VMware. I'll update this thread as soon as there's something to report.
They have admitted that this is a bug, but there doesn't appear to be a solution or workaround yet.
I hope so! I have another call with VMware that has been open since June last year, for a bug that caused my VMs to BSOD due to a bug in the VMtools VGA driver. We were running 4.0 Update 1 at the time, but it's still in 4.1. They've said it'll be fixed in the next update though (4.0 U3, 4.1 U1), so hopefully I can get both these support calls closed soon.
FYI - the same applies to the Power group alert definition.
I can pull one of the power cords, and the Health Status goes to yellow, then red in the Power group on the Health Status page, but the alarm is not triggered at the vCenter level.
What is odd to me is the vSphere client connected to the vCenter does not even have the Health Status page available - you have to connect the client directly to the host to even see that there is a problem.
Veeam's monitor app seems to see the Health status, but only reports on the top line health - i.e., I get an email saying that the health status haas changed, not the specific cause, like a failed disk or power supply.
I created an alarm defninition for Host enterning Maintenance Mode, with an email action, and that works from vCenter (I get an email). So it seems to be isolated to certain kinds of alarm definitions, maybe just the Hardware Health?
Hope that helps someone narrow it down...
Yes, it may be any Hardware Health related in general as I tested in different ways with other hardware being monitored and I got the same results.
Perhaps I am wrong, VMware should have the same policy as other companies when you find a bug they give you your support ticket back and if possible they develop a fix/patch before including in an update.
So like most of the times there's no choice but but to wait for VMware until hopefully they have a fix in a next new update/release.
Just to throw a +1 one on this...
I've seen the same thing. Pull a drive or pull a power cord (or both), and no alert is sent.
vCenter is very slow to display the degradation as well, though, in most cases, a drive/power failure isn't something that needs to be resolved immediately. I'd be happy with notification within a few hours, or even a day. Surely better than NO notification.
If you look at http://communities.vmware.com/thread/295229, there is an alternative solution, at least for an IBM server with the IMM/RSA module. No additional software is needed.
I would prefer to also have VMware alert me to keep all the monitoring and alerting in a central application, but in the meantime, I can at least be alerted if a disk or power supply decides to "check out".
I had tried using Veeam Monitor for this, but when I pulled a power cord, I kept getting emails from Veeam every 10 minutes even after I replaced the power cord and the Health Status on the host went back to green. I had to remove the Datacenter from Veeam Monitor to kill the emails.