I have a freshly built Horizon 7.2 environment running the non-FIPS Unified Access Gateway 3.0. This morning, users were unable to connect externally. I validated that direct connections to the View Connection Server functioned, and I logged into the admin interface on the UAG to ensure all of the Horizon services were up and in the "Green" status. Everything looked good. However, users were getting connection timed out when they tried connecting to the UAG. Rebooting the UAG resolved the issue. I have found another user mention this issue with 3.0 in the commends on Carl Stalhood's blog. Has anyone else run into this issue with 3.0?
We experienced this a few months ago on a UAG 3.0 with two NIC. The strange thing was that we have two of them configured identically for HA (Only difference is the IP and cert) and only one of them would consistently lock up. We were asked by support to redeploy the UAG several times and after a few days it would always lock up. Working with support we eventually determined the issue was with our default route. We were unnecessarily specifying a default route. Even though the config on the UAG (route -n) looked identical whether we specified it or not. We were also asked to move our load balancer health monitoring from the Management NIC to the Internet NIC. With those two changes our UAG had been stable for the past two months. Unfortunately both of them decided to lock up last night so I'll be opening another case with support.
I got this a few times in different customer environments, too. It happened with version 2.8.1 and also 3.0, two and one nic deployments, with and without a loadbalancer.
I have no idea what can be the reason for stop responding. One of my new deployment was last week and the first outstage was already 2 days after.
It looks like that it always happened at night. Maybe something to do with backup?
That's an interesting thought. This environment worked fine on Monday, and when the users came in to work the next day, nobody could connect. I do have a VDP job running nightly to backup the VM.
I installed the same UAG at another client (but in two nic mode) around the same time. They do not have any backups happening and haven't had this happen yet. They also haven't actively been using the UAG, just me hitting it to see if its still accepting connections.
I also noticed this same issue. Would work fine for a time and then just stop responding until the appliance was rebooted. Tried redeploying appliance multiple times all with same result.
Logged a support call with VMware to troubleshoot after a few weeks of log collection they couldn't find an answer for it and i ended up moving back to WIndows Securtiy server, as our userbase was getting too frustrated with all the downtime.
If anyone tries 3.1 and it's stable I'd be interested to know.
Had this happen on 3.0 at several customer-sites. Totally random. Days, weeks, sometimes no hangs for many weeks, then poof. Never could find a pattern. Loadbalancer health-check sees no issues, thus keeps selecting the UAG as it never drops out as far as it is concerned.
I never noticed anything strange either. Users simply said they could no longer login. Reboot was the only option to bring it back.
Replaced some of the 3.0's with 3.1 appliances in the meanwhile. Let's see how that goes.
We were on 3.0 and were stable for 3-4 months. We made some adjustments to the cipher/TLS settings and the next day both of our UAG were locked up and a simple reboot fixed the issue. We left one of the UAG in a broken state and worked with support but they couldn't figure it out. In our case we would see this error logged in esmanager.log when a user tried to connect. We've decided that any configuration change should be done by redeploying the UAG or at the minimum immediately accompanied by a reboot.
09/19 16:42:53,117[nioEventLoopGroup-34-4]INFO processor.ViewSession[terminateSession: 297]: Horizon session terminated due to expiration - Current Session count:1986
09/19 16:42:53,119[nioEventLoopGroup-34-4]ERROR proxy.HttpsProxyRequestHandler[write: 172]: Unexpected exception: bsg.unavailable.UNCONNECTED
Thanks for the info! When it happened I didn't really know what to look for so I'll check out the esmanager.log file in the future. It's only the first time we've had it happen, but we've only had the environment up for 5 or 6 weeks too.