VMware Horizon Community
nettech1
Expert
Expert

VDPCONNECT_NETWORK_FAILURE Agent Unreachable

Hi,

We are seeing a small percentage of VMs failing after users log in and start opening work apps. When the failure occurs the problematic VM is showing Agent Unreachable in Horizon Admin and users gets disconnected from the session. When looking at VM in vCetner after the user is kicked out the VM shows no signs of life, SMB and RDP connections are timing out, however pings are getting responses. The problem is not host or pool specific. We have seen this behavior on all hosts and all pools. We are on Windows 10 (1909) with Horizon Agent 7.11 and VMWare tools 11.0.0. The parent has been optimized with b1151 build of optimizer tool.

locked.jpg

What we see as a common pastern when the failure occurs  is High CPU utilization in debug log

DEBUG (09F4-0A3C) [ws_perfmon] System CPU use: 99%

INFO  (09F4-0A3C) [ws_perfmon] CPU-ALARM: cpu use = 99%

INFO  (09F4-0A3C) [ws_perfmon] High CPU use: pid=1288, sessionId=1, name=dwm, user=Window Manager\DWM-1, percent=45.7

INFO  (09F4-0A3C) [ws_perfmon] High CPU use: pid=5872, sessionId=1, name=explorer, user=CORP\vshang, percent=11.6

INFO  (09F4-0A3C) [ws_perfmon] High CPU use: pid=5260, sessionId=0, name=WmiPrvSE, user=NT AUTHORITY\NETWORK SERVICE, percent=8.1

INFO  (09F4-0A3C) [ws_perfmon] High CPU use: pid=2372, sessionId=0, name=svchost, user=NT AUTHORITY\NETWORK SERVICE, percent=4.4

and network failure in blast

[INFO ] 0x03c0 bora::Log: [VVCSessionManager] BlastSocketGetSessionMapEntry: SessionMap does have entry for vAuth:SX8SUc*****, sessionWrapper->vAuth:SX8SUc*****.

[INFO ] 0x03c0 bora::Log: [VVCSessionManager] BlastSocketHandleNetworkFailure: cookie is present for session: 1

[INFO ] 0x03c0 bora::Log: [Authentication] BlastSocketDropCookie: Cookie:7fKrF2***** dropped. Ref Count = 0.

[INFO ] 0x03c0 bora::Log: VVC: Closing the NCDeclined Channel, name: blast-mks, channelId: 7

[INFO ] 0x03c0 bora::Log: VVC: AbortChannel for channel blast-mks almost done.

[INFO ] 0x03c0 bora::Log: VVC: Closing the NCDeclined Channel, name: blast-audio, channelId: 9

[INFO ] 0x03c0 bora::Log: VVC: AbortChannel for channel blast-audio almost done.

[INFO ] 0x03c0 BlastSessionMgr::HandleConnectionClose: Handle BlastSocket connection close, vAuth:SX8SUc*****, vvcSessionId:1, Reason:VDPCONNECT_NETWORK_FAILURE (4)

[INFO ] 0x03c0 BlastSessionMgr::HandleConnectionClose: Shutdown OnConnectionClose for cnx:1

[INFO ] 0x03c0 BlastSessionMgr::DestroySession: cnx:1 closeReason:VDPCONNECT_NETWORK_FAILURE (4) flags:0x0

Any assistance with troubleshooting is greatly appreciated.

Thank you

0 Kudos
6 Replies
Shreyskar
VMware Employee
VMware Employee

Is this issue reproducible internally as well Or only happens for external connections?

Are you using vGPU based pools? Does it happen with all desktop pools Or specifically with windows 10 1909 pool only?

Are you able to reproduce the issue with PCoIP as well Or does it happen only with Blast?

How many vCPU you have assigned to VMs? Please note 2 or more CPU is recommended for blast protocol.

flags:0x0 indicates that the agent should wait for the client to attempt a reconnect rather than terminate the session.

To further troubleshoot first isolate the issue as per above suggestions and provide blast-worker and blast-service both log files from the problematic VM.

0 Kudos
nettech1
Expert
Expert

The issue happens to both, external and internal connections. It's not so easy to reproduce as it happens to random users in all pools on all hosts, but we are seeing an average failure of 5%. We moved to 1909 well before the issue surfaced. We don't have any vGPUs. Not using PCoIP, our thin clients don't support it. All VMs have 2 vCPUs.

We are starting to lean toward an in house developed or 3rd party software bug causing this issue. Suspecting a deadlock, but don't have any concrete evidence.

0 Kudos
Shreyskar
VMware Employee
VMware Employee

Okay. Make sure your thin-client is listed in VMware official compatibility matrix:

https://www.vmware.com/resources/compatibility/pdf/vi_view_guide.pdf

0 Kudos
cabli01
Contributor
Contributor

Same problem here,

Windows 10 1809 (LTSC)

Horizon 7.12

VMware Tools 11.1.0

VCSA + ESXi are in 6.7

Problem appear with or without Optimizations. We are using Liquidware ProfileUnity and the problem occurs just after the login process.

0 Kudos
nettech1
Expert
Expert

in our case the lock up was caused by a poorly coded driver which was creating a dead lock.

https://www.quora.com/How-are-deadlocks-and-RAM-related

0 Kudos
nettech1
Expert
Expert

cabli01

how big is your pool?

We have 800 VMs spread across 16 UCS blades. Our average failure rate was around 15%.

0 Kudos