Suiname
Enthusiast
Enthusiast

VMware View alerts / monitoring

So I've recently setup a vmware view environment that we've had running for several months, everything seems to be running smoothly.  We've had the services go down a couple times so far, but each time it was either a system service stopped or the web console went down, which I have SCOM alerts setup for and was alerted and remedied immediately. However, today we had the first problem with it that concerns me, mainly because ostensibly nothing seemed to be down.  I was alerted by one of the people using the zero client that they couldn't connect, so I went into the admin console on the vmware connection server, everything appears to be fine from the dashboard.  I looked at the events log, I see that attempts to connect sessions are timing out, but no indication that anything is wrong still.  So I point my web browser to the view security server, again everything looks fine, it opens my view client, but when I enter my credentials it just sits at the connecting screen forever and doesn't give me a session.  So I log into the view security server, look at the services and guess what?  They are all up!  So it seems nothing is wrong, yet I still can't connect.  I tried a few more times just to make sure I wasn't crazy, and I wasn't, no connections can be made.  So I decide I will restart the VMware View Security Server service, and of course that fixed it, now connections can in fact be made.

This concerns me because all of the monitoring I have currently setup (monitoring all windows services on both connection and security server and monitoring all web services on both connection and security server) will not alert us to this scenario, but the service is not working.  Short of actually connecting a client, there didn't seem to be any way to detect that the service was down.  Does anyone have any experience with issues similar to this or any way to monitor this service short of having someone actually try to connect with a client?  If I were to import the VMware view management pack into SCOM would it have detected the failuer in this scenario?  I'm not sure it would since none of the services seemed to be in faulty health states on the servers themselves.  Thanks in advance

0 Kudos
3 Replies
chillware1
Enthusiast
Enthusiast

to me, it doesnt sound like a service issue, but more a networking issue or even at AD or the agents on the VMs. has anything changed, ran any esx updates? checked your servers firewalls?

0 Kudos
Suiname
Enthusiast
Enthusiast

no, nothing changed at all.  Restarting the security server service in the services.msc on the security server remedied the problem, so clearly something was amiss with the service (even though windows and SCOM were not reporting the service as stopped or problematic).

0 Kudos
Suiname
Enthusiast
Enthusiast

Bumping this thread to the top.  I have had another problem today which was not caught by any of the monitoring.  I have a load balanced set of VMware view security servers, one of which was not working correctly this morning.  The client would connect to the server, then prompt for your credentials, and then when you entered them, it would sit there forever at the stage "authenticating..." and just never connect.  I was able to remove this from the NLB, restart both the security and connection server, and now everything is fine.  However, this is troubling to me for a few reasons:

1) The admin console showed the status of all servers as fine

2) The webserver parts of both failed security and connection servers were also functioning just fine (I connected with a browser to make sure before restarting) and thus did not trigger the alert from my web monitor

3) The windows services which run VMware view also indicated no problems, again not triggering any of the services monitors I have setup.

The only way I could have detected the failure I just ran into was someone trying to login, and not being able to.  Does anyone have some sort of monitoring solution for VMware view that detects such a scenario?  This service is going to need to be available pretty much 24/7 as we are using it for our public computing here, so I need to be able to alert our on call staff if something needs to be fixed.  Please help if you can.

0 Kudos