VMware Cloud Community
MattG
Expert
Expert

What vSphere Faults should be reported on in vCops?

I am experiencing a vmnic down issue on vSphere hosts:

  • vmnic vmnic4 linkstate down

Why is it showing up in Syslog,  the Host's Events in vSphere Client,  but not as a Fault in vCops?

Is there a reference to what Fault vCops gathers?

Thanks,

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
14 Replies
IamTHEvilONE
Immortal
Immortal

I think it should cause a degredation in some areas:

1. loss of redundancy (less severe)

2. loss of network connectivity (more severe)

I think link state down should set the fault badge for the ESX host to 100, but other objects may not be affected as long as there is redundancy (additional NICs on the vSwitch).

0 Kudos
MattG
Expert
Expert

It doesn't show up as a Fault. Thus doesn't get calculated in the Health score.

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
IamTHEvilONE
Immortal
Immortal

can you determine the exact type of fault this is in vCenter?  you might be able to get it from the MOB, DB or elsewhere.

Specifically, we would need to see it in a format like this: esx.problem.net.vmnic.linkstate.down

If it's a unique one ... or different than we had expected, it might not fault correctly.

Is everything else for that ESX host reporting just fine?

0 Kudos
IamTHEvilONE
Immortal
Immortal

also, what is the build number of vCops, vCenter and ESX ... just to calibrate to your specific instance.

0 Kudos
MattG
Expert
Expert

vCops 5.03

vCenter 5.0 - 804277

vSphere 5.0 - 821916

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
MattG
Expert
Expert

These are the three that alert when the issue occurs and vCops does not report on:

Thanks,

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
IamTHEvilONE
Immortal
Immortal

I'm currently double checking this.

esx.problem.net.dvport.redundancy.lost

I think we should report a fault based on this error, and should be a fault score of about 70.  I'm not sure if it needs an associated vm nic error to degrade the health status as well.  Note that the fault badge is a MAX of all current fault scores ... so a 100 would trump this.

Regardless, it's not happening in your environment.

vob.net.dvport.uplink.transition.down

I don't think we report baesd on this exact name ... and I tried a could substrings.

vob.net.vmnic.linkstate.down

same here ... it's not tracked by this name.  we can get esx.problem.net.vmnic.linkstate.down

Note: I think there was an extra decimal in there ".."

0 Kudos
IamTHEvilONE
Immortal
Immortal

Just confirmed that my previous statements were true.  we don't fault on the vob events, but should on the 3rd.

If you really want to know why the Fault badge isn't adjusting ... we should proceed via a support request.

If for some reason collections failed during the specific interval when the fault event was created, it's possible that it was missed.  Alternatively, if the link state was restored ... we should automatically resolve the fault event.  faults are only active while the event is active ... not restored (resolved) in vCenter.

0 Kudos
rchandran
VMware Employee
VMware Employee

For the host in queastion, can you navigate to the events tab under operations and slect 'Events on self' (the target icon thing) and tell us if you see the redundancy lost event show up there?

0 Kudos
MattG
Expert
Expert

If it is restored, which it is in my case, would I still be able to see the alert inside of vCops as an alert, but not a Fault (since it was resolved)?

Even though it is resolved it should be noted as an event of some sort so that it can be correlated with performance.

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
rchandran
VMware Employee
VMware Employee

Yes, you should be able to see the alert by filtering for inactive alerts.

You should also be able to see the event in the event tab.

0 Kudos
MattG
Expert
Expert

They do! I just assumed they would be faults.

Thanks,

-Matt

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
MattG
Expert
Expert

I am able to see the alerts under the Opertions -> Events. Do the events that come in from vCenter get classified by severity? Also, why does a NIC down show up as a Change Event? Shouldn't it be an alert?

Thanks,

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
rchandran
VMware Employee
VMware Employee

The NIC down is treated as both a change and a fault event. Reason is ther can be situation where users completely turn off faults. In those situations, we still would like to display this event as a change event.

If you see the change event for the NIC down, you should also see a fault alert in the alerts tab for the host in question (even if it has got cancelled due to issue being resolved). If you don't it could be because:

- Faults were turned off for that host (Configuration or group opr globally)

- The host may have been in maintenance mode in vCenter

0 Kudos