VMware Cloud Community
conradsia
Hot Shot
Hot Shot
Jump to solution

HBA has failed alarm?

Does anyone have a way of getting an alert when an HBA fails?

The clients fiber switch doesn't send alerts and no snmp monitoring is configured so I'm looking for a way in ESX or VC to be notified when I lose a HBA connection from the server to the switch. The same for a network card failure.

Anyone?

Thanks,

Conrad

Reply
0 Kudos
1 Solution

Accepted Solutions
grasshopper
Virtuoso
Virtuoso
Jump to solution

If you have a failure of a path (or to a secondary HBA) you may see some, or all of the following:

"Manual switchover to path vmhbax:y:z begins"

"Changing active path to vmhbax.y.z"

"Manual switchover to vmhbax.y.z completed successfully"

-or-

"Delaying failover to path vmhbax.y.z"

"Manual switchover to path vmhbax.y.z begins."

"Manual switchover to vmhbax.y.z completed successfully."

-or, our worst nightmare-

"Manual switchover to path vmhbax.y.z begins."

"Did not switchover to vmhbax.y.z"

(where x.y.z are the adapter.target.lun, of course[/i])

Further, if ESX 3.x detects a hardware failure of the HBA, it may also log the following text:

"Unrecoverable hardware error : Adapter being marked offline"

Where to look? If it has already been written to the logs, you can check for it here:

/var/log/vmkernel

Or, for real time access to logging that hasn't been written to disk yet:

/proc/vmware/log

What to look for? The key word here is 'switchover'

View solution in original post

Reply
0 Kudos
9 Replies
bretti
Expert
Expert
Jump to solution

There is probably something in /var/log/vmkwarning that you could monitor for. Or some kind of syslog script.

VTorque
Contributor
Contributor
Jump to solution

Or could your SAN controllers alert you?

Message was edited by:

VTorque

Reply
0 Kudos
conradsia
Hot Shot
Hot Shot
Jump to solution

I can get alarms from the SAN controllers but I would really like to get an alert from ESX as well.

Reply
0 Kudos
bretti
Expert
Expert
Jump to solution

I was thinking about this some more and was wondering if you are running dual HBAs and dual NICs on your host server?

You could try pulling one of the HBA cables and see what shows up in the logs. If you found a key phrase to search on, maybe DISK or VMHBA in a particular log file that would help.

Also, I was wondering if you were looking for real time notification or if a daily notification was ok? If a daily is ok, there may be a script you can throw together that will look for new entries in the log files and alert you if there are any errors.

grasshopper
Virtuoso
Virtuoso
Jump to solution

If you have a failure of a path (or to a secondary HBA) you may see some, or all of the following:

"Manual switchover to path vmhbax:y:z begins"

"Changing active path to vmhbax.y.z"

"Manual switchover to vmhbax.y.z completed successfully"

-or-

"Delaying failover to path vmhbax.y.z"

"Manual switchover to path vmhbax.y.z begins."

"Manual switchover to vmhbax.y.z completed successfully."

-or, our worst nightmare-

"Manual switchover to path vmhbax.y.z begins."

"Did not switchover to vmhbax.y.z"

(where x.y.z are the adapter.target.lun, of course[/i])

Further, if ESX 3.x detects a hardware failure of the HBA, it may also log the following text:

"Unrecoverable hardware error : Adapter being marked offline"

Where to look? If it has already been written to the logs, you can check for it here:

/var/log/vmkernel

Or, for real time access to logging that hasn't been written to disk yet:

/proc/vmware/log

What to look for? The key word here is 'switchover'

Reply
0 Kudos
jparnell
Hot Shot
Hot Shot
Jump to solution

What hardware are you using?

We've got HP DL585's. We have installed the HP insight agents on each of our hosts, and then set them up to send SNMP traps to HP Openview Operations when things like HBA or NIC connections fail.

I'm sure there are similar agents for Dell and IBM hardware.

Reply
0 Kudos
conradsia
Hot Shot
Hot Shot
Jump to solution

Thanks everyone it's all very helpful. We are using IBM x366 and yes I am looking for a real time notification to let me know an hba has failed.

Grasshopper, your input was very helpful, thanks.

I can look through logs myself but I really don't want to have to do that, I'm looking into what damage will be done loading a director agent onto ESX. The last time I loaded it, it was using java which I really didn't like.

I don't get anything on the hba's from snmp or in the normal director agent, but I'm going to load in a RSA card and see if that does anything for me.

#conrad

Reply
0 Kudos
bretti
Expert
Expert
Jump to solution

Unfortunately I don't think Director or an RSA card will alert you on failed HBA's.

I agree with you on loading director, not a good idea. My opinion is that the director software causes more trouble than good. The RSA's are really nice. It seems that both Director and RSA's are limited in their ability to report on HBA problems.

How's are you at shell scripts? Smiley Happy

Reply
0 Kudos
conradsia
Hot Shot
Hot Shot
Jump to solution

I'm starting to get the impression that the only way is going to be with a shell script. We've loaded the RSA card but so far it doesn't look too promising.

Thanks for everyone's input, if lightning strikes in the middle of the night and you get some genious idea please let me know.

If I script something that works Smiley Happy I'll be more than happy to post it.

Conrad

Reply
0 Kudos