VMware Cloud Community
Leffingwell
Contributor
Contributor
Jump to solution

Two faults - how to interpret?

Need to preface this by saying everything appears to be reporting data properly - also I only just started at this company 3 days ago so I don't have all the information on these things since the faults are a bit old.

vCenter Service overall health changed from `not available` to `red`. Generated by vCenter Server.
   Alert Information
      Fault criticality:  Critical
      Resource type:  vCenter Server
      Resource name:  SGCVMVCS01
      Parent name:  World
      Event time:  Mar 19, 2012 2:57:23 PM
   Event Source Details
      Event source:  SGCVMVCS01
      Source event object name:  SGCVMVCS01
      Source event name:  vCenter Service Overall Health Changed
      Source event status:  vCenter  Service overall health changed from 'not available' to 'red' . vCenter Operations Manager message: Make sure that the vCenter  management web service is running, and use the vSphere Client to access  the vCenter Service Status icon for more information.

This is the first one - this has changed several times it looks like from red to green, green to red, red to unavailable and back. Here's the second fault I'm seeing across several hosts:

Lost connection to NFS server
   Alert Information
      Fault criticality:  Critical
      Resource type:  Host
      Resource name:  sgcsvesx04.us.planview.world
      Parent name:  SG-Engineering
      Event time:  Mar 13, 2012 5:54:05 PM
   Event Source Details
      Event source:  SGCVMVCS01
      Source event object name:  sgcsvesx04.us.planview.world
      Source event name:  Lost connection to NFS server
      Source event status:  Lost connection to NFS server
[ce6ced02-1a151e76-0000-000000000000, /vol/sgcnscnt02_dev2, sgcnscnt02, sgcnscnt02_dev2]
[7283806d-8bad585b-0000-000000000000, /vol/sgcnscnt01_dev1, sgcnscnt01, sgcnscnt01_dev1]

How do I verify if these errors are still issues or not?? I think I read another discussion on how to suppress a fault so if these are just old artifacts I should be able to manage that.  Finally, are there any resources on these error messages or what they mean?  Are there any books that exist yet or resources you've found useful for helping you make the most of vCOPS?  Thanks in advance all!

Reply
0 Kudos
1 Solution

Accepted Solutions
gradinka
VMware Employee
VMware Employee
Jump to solution

"Faults" in VCOps are problems directly reported by VC.

e.g., they are not calculated by any logic - it's the VC wich says - "hey, see, there is this problem that I have right now"

So log into VI client and see what's going on.

1)

For the first fault (health changed from X to Y), in VI-client go to 'Home'->'vCenter Service Status'.
That health value (in faults) is calculated by VC, from that service, and you should be able to clearly see what is the exact issue.

2) For the second issue, check your storage/NFS connection to the host mentioned.
   It may have been already restored

faults are cancellable - onvce you've checked all is OK, go to 'Alerts' screen in VCOps, select the fault(s) and cancel them using corresponding button.

View solution in original post

Reply
0 Kudos
3 Replies
critical3rr0r
Enthusiast
Enthusiast
Jump to solution

I would start with your normal troubleshooting steps. Drill down into the server mentioned below that is having the issue. Looks at the history of alerts in vCops. Utilize the root cause analysis feature to give you leads on what might be causing the communication issues with the vCenter server. Check event viewer on the server in question. The 2 issue below may or may not be related.

As far as documentation you can read the VMware vCenter Operations Manager Enterprise Getting Started Guide and the  VMware vCenter Operations Manager Enterprise Administrattion Guide for more general knowledge of managing the product and alerts.

"All you touch and all you see is all your life will ever be."
Reply
0 Kudos
gradinka
VMware Employee
VMware Employee
Jump to solution

"Faults" in VCOps are problems directly reported by VC.

e.g., they are not calculated by any logic - it's the VC wich says - "hey, see, there is this problem that I have right now"

So log into VI client and see what's going on.

1)

For the first fault (health changed from X to Y), in VI-client go to 'Home'->'vCenter Service Status'.
That health value (in faults) is calculated by VC, from that service, and you should be able to clearly see what is the exact issue.

2) For the second issue, check your storage/NFS connection to the host mentioned.
   It may have been already restored

faults are cancellable - onvce you've checked all is OK, go to 'Alerts' screen in VCOps, select the fault(s) and cancel them using corresponding button.

Reply
0 Kudos
Leffingwell
Contributor
Contributor
Jump to solution

Thanks again for the precise feed back G Smiley Happy.  You too crit for the tactical problem solving ideas.  I got everything sorted out and am looking down the barrel of green health badges for the moment.  Take care you two

Reply
0 Kudos