pchapman
Hot Shot
Hot Shot

Unified Access Gateway 3.0 stops responding after a few weeks?

I have a freshly built Horizon 7.2 environment running the non-FIPS Unified Access Gateway 3.0.  This morning, users were unable to connect externally.  I validated that direct connections to the View Connection Server functioned, and I logged into the admin interface on the UAG to ensure all of the Horizon services were up and in the "Green" status.  Everything looked good.  However, users were getting connection timed out when they tried connecting to the UAG.  Rebooting the UAG resolved the issue.  I have found another user mention this issue with 3.0 in the commends on Carl Stalhood's blog.  Has anyone else run into this issue with 3.0?

17 Replies
RyanHardy
Enthusiast
Enthusiast

We've had just the same problem twice now with UAG 2.9 (non-FIPS). Only a reboot could solve it. Smiley Sad

0 Kudos
BenFB
Commander
Commander

We experienced this a few months ago on a UAG 3.0 with two NIC. The strange thing was that we have two of them configured identically for HA (Only difference is the IP and cert) and only one of them would consistently lock up. We were asked by support to redeploy the UAG several times and after a few days it would always lock up. Working with support we eventually determined the issue was with our default route. We were unnecessarily specifying a default route. Even though the config on the UAG (route -n) looked identical whether we specified it or not. We were also asked to move our load balancer health monitoring from the Management NIC to the Internet NIC. With those two changes our UAG had been stable for the past two months. Unfortunately both of them decided to lock up last night so I'll be opening another case with support.

0 Kudos
Erossman
Enthusiast
Enthusiast

Hi Guys,

I got this a few times in different customer environments, too. It happened with version 2.8.1 and also 3.0, two and one nic deployments, with and without a loadbalancer.

I have no idea what can be the reason for stop responding. One of my new deployment was last week and the first outstage was already 2 days after.

It looks like that it always happened at night. Maybe something to do with backup?

0 Kudos
pchapman
Hot Shot
Hot Shot

That's an interesting thought.  This environment worked fine on Monday, and when the users came in to work the next day, nobody could connect.  I do have a VDP job running nightly to backup the VM.

I installed the same UAG at another client (but in two nic mode) around the same time.  They do not have any backups happening and haven't had this happen yet.  They also haven't actively been using the UAG, just me hitting it to see if its still accepting connections.

0 Kudos
BenFB
Commander
Commander

We do typically see the issue crop up overnight but we have had it in the middle of the day too. We don't backup any of our UAG since we consider them disposable.

0 Kudos
Erossman
Enthusiast
Enthusiast

A new version of the UAG is available since last night. --> Version 3.1

Release Notes for VMware Unified Access Gateway 3.1

Nothing in the release notes about resolved issues Smiley Sad

Give it a try???

0 Kudos
RyanHardy
Enthusiast
Enthusiast

I'm on 3.1 since a few days. Time will tell if they managed to fix this bug...

0 Kudos
robgj821
Contributor
Contributor

I also noticed this same issue. Would work fine for a time and then just stop responding until the appliance was rebooted. Tried redeploying appliance multiple times all with same result.

Logged a support call with VMware to troubleshoot after a few weeks of log collection they couldn't find an answer for it and i ended up moving back to WIndows Securtiy server, as our userbase was getting too frustrated with all the downtime.

If anyone tries 3.1 and it's stable I'd be interested to know.

0 Kudos
RyanHardy
Enthusiast
Enthusiast

For now it's still running w/o problems. *knockonwood* It will take some more weeks though to really tell if there's a difference.

At least 3.1 doesn't seem to be worse than 2.9. Smiley Wink

0 Kudos
srodenburg
Expert
Expert

Had this happen on 3.0 at several customer-sites. Totally random. Days, weeks, sometimes no hangs for many weeks, then poof. Never could find a pattern. Loadbalancer health-check sees no issues, thus keeps selecting the UAG as it never drops out as far as it is concerned.

I never noticed anything strange either. Users simply said they could no longer login. Reboot was the only option to bring it back.

Replaced some of the 3.0's with 3.1 appliances in the meanwhile. Let's see how that goes.

0 Kudos
ChuckS42
Contributor
Contributor

Well, 3.1 is stable if you aren't using a 3-nic configuration and using RADIUS.

0 Kudos
Thomas3528
Contributor
Contributor

When you met the issue, you can use the "supervisorctl restart all" to restart UAG's service.

0 Kudos
tjbailey
Enthusiast
Enthusiast

We just experienced the same issue with one of our UAG's last night (also running 3.0).  Has anyone confirmed whether 3.1 fixes their issue?

0 Kudos
RyanHardy
Enthusiast
Enthusiast

We're using 3.1 for two months now without issues. I don't recall it ever ran that long with 2.9 so I guess they fixed it. *knockonwood*

0 Kudos
BenFB
Commander
Commander

We were on 3.0 and were stable for 3-4 months. We made some adjustments to the cipher/TLS settings and the next day both of our UAG were locked up and a simple reboot fixed the issue. We left one of the UAG in a broken state and worked with support but they couldn't figure it out. In our case we would see this error logged in esmanager.log when a user tried to connect. We've decided that any configuration change should be done by redeploying the UAG or at the minimum immediately accompanied by a reboot.

09/19 16:42:53,117[nioEventLoopGroup-34-4]INFO  processor.ViewSession[terminateSession: 297][]: Horizon session terminated due to expiration - Current Session count:1986

09/19 16:42:53,119[nioEventLoopGroup-34-4]ERROR proxy.HttpsProxyRequestHandler[write: 172][]: Unexpected exception: bsg.unavailable.UNCONNECTED

java.lang.IllegalStateException: bsg.unavailable.UNCONNECTED

at com.vmware.euc.gateway.products.view.bsg.BsgManager.assertAvailability(BsgManager.java:501)

at com.vmware.euc.gateway.products.view.bsg.BsgManager.removeConnection(BsgManager.java:347)

at com.vmware.euc.gateway.products.view.interceptor.processor.ViewSession.closeConnections(ViewSession.java:435)

at com.vmware.euc.gateway.products.view.interceptor.processor.ViewSession.terminateSession(ViewSession.java:303)

at com.vmware.euc.gateway.products.view.interceptor.processor.ViewSession$1.sessionExpired(ViewSession.java:280)

at com.vmware.euc.gateway.edgeservice.sdk.session.Session.notifyExpirationListeners(Session.java:120)

at com.vmware.euc.gateway.edgeservice.sdk.session.SessionManager.expireSessions(SessionManager.java:322)

at com.vmware.euc.gateway.edgeservice.sdk.session.SessionManager.setExpirationFuture(SessionManager.java:296)

at com.vmware.euc.gateway.edgeservice.sdk.session.SessionManager.create(SessionManager.java:156)

at com.vmware.euc.gateway.networkcore.proxy.HttpsProxyRequestHandler.write(HttpsProxyRequestHandler.java:116)

at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:724)

at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:716)

at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:802)

at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:709)

at com.vmware.euc.gateway.networkcore.proxy.HttpRequestAggregator.write(HttpRequestAggregator.java:115)

at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:724)

at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:787)

at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:800)

at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:780)

at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:817)

at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1011)

at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:289)

at com.vmware.euc.gateway.networkcore.HttpsRequestRouter.write(HttpsRequestRouter.java:243)

at com.vmware.euc.gateway.networkcore.HttpsRequestRouter.channelRead(HttpsRequestRouter.java:144)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)

at com.vmware.euc.gateway.networkcore.session.AuthenticatorHandler.channelRead(AuthenticatorHandler.java:42)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)

at com.vmware.euc.gateway.networkcore.session.SessionRequestHandler.channelRead(SessionRequestHandler.java:72)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)

at com.vmware.euc.gateway.networkcore.LocalContentHandler.channelRead(LocalContentHandler.java:91)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)

at com.vmware.euc.gateway.networkcore.DoSPreventionHandler.channelRead(DoSPreventionHandler.java:232)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)

at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)

at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)

at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1070)

at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:904)

at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)

at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:326)

at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1320)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:334)

at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:905)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:563)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:504)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:418)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:390)

at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)

at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:145)

at java.lang.Thread.run(Thread.java:748)

0 Kudos
tjbailey
Enthusiast
Enthusiast

Thanks for the update!

0 Kudos
tjbailey
Enthusiast
Enthusiast

Thanks for the info!  When it happened I didn't really know what to look for so I'll check out the esmanager.log file in the future.  It's only the first time we've had it happen, but we've only had the environment up for 5 or 6 weeks too.

0 Kudos