I am experiencing a vmnic down issue on vSphere hosts:
Why is it showing up in Syslog, the Host's Events in vSphere Client, but not as a Fault in vCops?
Is there a reference to what Fault vCops gathers?
Thanks,
-MattG
I think it should cause a degredation in some areas:
1. loss of redundancy (less severe)
2. loss of network connectivity (more severe)
I think link state down should set the fault badge for the ESX host to 100, but other objects may not be affected as long as there is redundancy (additional NICs on the vSwitch).
It doesn't show up as a Fault. Thus doesn't get calculated in the Health score.
-MattG
can you determine the exact type of fault this is in vCenter? you might be able to get it from the MOB, DB or elsewhere.
Specifically, we would need to see it in a format like this: esx.problem.net.vmnic.linkstate.down
If it's a unique one ... or different than we had expected, it might not fault correctly.
Is everything else for that ESX host reporting just fine?
also, what is the build number of vCops, vCenter and ESX ... just to calibrate to your specific instance.
vCops 5.03
vCenter 5.0 - 804277
vSphere 5.0 - 821916
-MattG
I'm currently double checking this.
esx.problem.net.dvport.redundancy.lost
I think we should report a fault based on this error, and should be a fault score of about 70. I'm not sure if it needs an associated vm nic error to degrade the health status as well. Note that the fault badge is a MAX of all current fault scores ... so a 100 would trump this.
Regardless, it's not happening in your environment.
vob.net.dvport.uplink.transition.down
I don't think we report baesd on this exact name ... and I tried a could substrings.
same here ... it's not tracked by this name. we can get esx.problem.net.vmnic.linkstate.down
Note: I think there was an extra decimal in there ".."
Just confirmed that my previous statements were true. we don't fault on the vob events, but should on the 3rd.
If you really want to know why the Fault badge isn't adjusting ... we should proceed via a support request.
If for some reason collections failed during the specific interval when the fault event was created, it's possible that it was missed. Alternatively, if the link state was restored ... we should automatically resolve the fault event. faults are only active while the event is active ... not restored (resolved) in vCenter.
For the host in queastion, can you navigate to the events tab under operations and slect 'Events on self' (the target icon thing) and tell us if you see the redundancy lost event show up there?
If it is restored, which it is in my case, would I still be able to see the alert inside of vCops as an alert, but not a Fault (since it was resolved)?
Even though it is resolved it should be noted as an event of some sort so that it can be correlated with performance.
-MattG
Yes, you should be able to see the alert by filtering for inactive alerts.
You should also be able to see the event in the event tab.
They do! I just assumed they would be faults.
Thanks,
-Matt
I am able to see the alerts under the Opertions -> Events. Do the events that come in from vCenter get classified by severity? Also, why does a NIC down show up as a Change Event? Shouldn't it be an alert?
Thanks,
-MattG
The NIC down is treated as both a change and a fault event. Reason is ther can be situation where users completely turn off faults. In those situations, we still would like to display this event as a change event.
If you see the change event for the NIC down, you should also see a fault alert in the alerts tab for the host in question (even if it has got cancelled due to issue being resolved). If you don't it could be because:
- Faults were turned off for that host (Configuration or group opr globally)
- The host may have been in maintenance mode in vCenter