I have a freshly built Horizon 7.2 environment running the non-FIPS Unified Access Gateway 3.0. This morning, users were unable to connect externally. I validated that direct connections to the View Connection Server functioned, and I logged into the admin interface on the UAG to ensure all of the Horizon services were up and in the "Green" status. Everything looked good. However, users were getting connection timed out when they tried connecting to the UAG. Rebooting the UAG resolved the issue. I have found another user mention this issue with 3.0 in the commends on Carl Stalhood's blog. Has anyone else run into this issue with 3.0?
We've had just the same problem twice now with UAG 2.9 (non-FIPS). Only a reboot could solve it.
We experienced this a few months ago on a UAG 3.0 with two NIC. The strange thing was that we have two of them configured identically for HA (Only difference is the IP and cert) and only one of them would consistently lock up. We were asked by support to redeploy the UAG several times and after a few days it would always lock up. Working with support we eventually determined the issue was with our default route. We were unnecessarily specifying a default route. Even though the config on the UAG (route -n) looked identical whether we specified it or not. We were also asked to move our load balancer health monitoring from the Management NIC to the Internet NIC. With those two changes our UAG had been stable for the past two months. Unfortunately both of them decided to lock up last night so I'll be opening another case with support.
Hi Guys,
I got this a few times in different customer environments, too. It happened with version 2.8.1 and also 3.0, two and one nic deployments, with and without a loadbalancer.
I have no idea what can be the reason for stop responding. One of my new deployment was last week and the first outstage was already 2 days after.
It looks like that it always happened at night. Maybe something to do with backup?
That's an interesting thought. This environment worked fine on Monday, and when the users came in to work the next day, nobody could connect. I do have a VDP job running nightly to backup the VM.
I installed the same UAG at another client (but in two nic mode) around the same time. They do not have any backups happening and haven't had this happen yet. They also haven't actively been using the UAG, just me hitting it to see if its still accepting connections.
We do typically see the issue crop up overnight but we have had it in the middle of the day too. We don't backup any of our UAG since we consider them disposable.
A new version of the UAG is available since last night. --> Version 3.1
Release Notes for VMware Unified Access Gateway 3.1
Nothing in the release notes about resolved issues
Give it a try???
I'm on 3.1 since a few days. Time will tell if they managed to fix this bug...
I also noticed this same issue. Would work fine for a time and then just stop responding until the appliance was rebooted. Tried redeploying appliance multiple times all with same result.
Logged a support call with VMware to troubleshoot after a few weeks of log collection they couldn't find an answer for it and i ended up moving back to WIndows Securtiy server, as our userbase was getting too frustrated with all the downtime.
If anyone tries 3.1 and it's stable I'd be interested to know.
For now it's still running w/o problems. *knockonwood* It will take some more weeks though to really tell if there's a difference.
At least 3.1 doesn't seem to be worse than 2.9.
Had this happen on 3.0 at several customer-sites. Totally random. Days, weeks, sometimes no hangs for many weeks, then poof. Never could find a pattern. Loadbalancer health-check sees no issues, thus keeps selecting the UAG as it never drops out as far as it is concerned.
I never noticed anything strange either. Users simply said they could no longer login. Reboot was the only option to bring it back.
Replaced some of the 3.0's with 3.1 appliances in the meanwhile. Let's see how that goes.
Well, 3.1 is stable if you aren't using a 3-nic configuration and using RADIUS.
When you met the issue, you can use the "supervisorctl restart all" to restart UAG's service.
We just experienced the same issue with one of our UAG's last night (also running 3.0). Has anyone confirmed whether 3.1 fixes their issue?
We're using 3.1 for two months now without issues. I don't recall it ever ran that long with 2.9 so I guess they fixed it. *knockonwood*
We were on 3.0 and were stable for 3-4 months. We made some adjustments to the cipher/TLS settings and the next day both of our UAG were locked up and a simple reboot fixed the issue. We left one of the UAG in a broken state and worked with support but they couldn't figure it out. In our case we would see this error logged in esmanager.log when a user tried to connect. We've decided that any configuration change should be done by redeploying the UAG or at the minimum immediately accompanied by a reboot.
09/19 16:42:53,117[nioEventLoopGroup-34-4]INFO processor.ViewSession[terminateSession: 297][]: Horizon session terminated due to expiration - Current Session count:1986
09/19 16:42:53,119[nioEventLoopGroup-34-4]ERROR proxy.HttpsProxyRequestHandler[write: 172][]: Unexpected exception: bsg.unavailable.UNCONNECTED
java.lang.IllegalStateException: bsg.unavailable.UNCONNECTED
at com.vmware.euc.gateway.products.view.bsg.BsgManager.assertAvailability(BsgManager.java:501)
at com.vmware.euc.gateway.products.view.bsg.BsgManager.removeConnection(BsgManager.java:347)
at com.vmware.euc.gateway.products.view.interceptor.processor.ViewSession.closeConnections(ViewSession.java:435)
at com.vmware.euc.gateway.products.view.interceptor.processor.ViewSession.terminateSession(ViewSession.java:303)
at com.vmware.euc.gateway.products.view.interceptor.processor.ViewSession$1.sessionExpired(ViewSession.java:280)
at com.vmware.euc.gateway.edgeservice.sdk.session.Session.notifyExpirationListeners(Session.java:120)
at com.vmware.euc.gateway.edgeservice.sdk.session.SessionManager.expireSessions(SessionManager.java:322)
at com.vmware.euc.gateway.edgeservice.sdk.session.SessionManager.setExpirationFuture(SessionManager.java:296)
at com.vmware.euc.gateway.edgeservice.sdk.session.SessionManager.create(SessionManager.java:156)
at com.vmware.euc.gateway.networkcore.proxy.HttpsProxyRequestHandler.write(HttpsProxyRequestHandler.java:116)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:724)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:716)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:802)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:709)
at com.vmware.euc.gateway.networkcore.proxy.HttpRequestAggregator.write(HttpRequestAggregator.java:115)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:724)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:787)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:800)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:780)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:817)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1011)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:289)
at com.vmware.euc.gateway.networkcore.HttpsRequestRouter.write(HttpsRequestRouter.java:243)
at com.vmware.euc.gateway.networkcore.HttpsRequestRouter.channelRead(HttpsRequestRouter.java:144)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)
at com.vmware.euc.gateway.networkcore.session.AuthenticatorHandler.channelRead(AuthenticatorHandler.java:42)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)
at com.vmware.euc.gateway.networkcore.session.SessionRequestHandler.channelRead(SessionRequestHandler.java:72)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)
at com.vmware.euc.gateway.networkcore.LocalContentHandler.channelRead(LocalContentHandler.java:91)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)
at com.vmware.euc.gateway.networkcore.DoSPreventionHandler.channelRead(DoSPreventionHandler.java:232)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1070)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:904)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1320)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:905)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:563)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:504)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:418)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:390)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:145)
at java.lang.Thread.run(Thread.java:748)
Thanks for the update!
Thanks for the info! When it happened I didn't really know what to look for so I'll check out the esmanager.log file in the future. It's only the first time we've had it happen, but we've only had the environment up for 5 or 6 weeks too.