VMware Horizon Community
beachITguy
Contributor
Contributor

Users randomly getting disconnected

Hello,
My environment is experiencing issues with users at random getting disconnected from their VDI desktops
I have experienced this myself and I can tell you what happened to my session when I get disconnected.

I am currently using Horizon View Client
Version: 2111
Build: 8.4.0 (189968194)

A disconnect happened to me this morning and all I had open application wise was Edge, Word, MS Teams, Notepad++.

I was typing in the notepad++ and the screen just froze, no longer accepting inputs, cannot change screens.
This stays this way for about 30 seconds to a minute and then I get booted to the view client screen to log back into the desktop pool
I am able to get back into the desktop after about a minute or two and it logs me back into the exact same desktop with all the windows I had open, remained open

I am the Network Engineer and I can confirm that we have the necessary ports open to the FW to allow traffic thru. This start happening about a two weeks ago, and before that everything work normally.

I am not in charge of the configuration of the connection servers or the desktops so I would not be able to answer any questions related to those.

But like I said this happens randomly to random people thru out the day. We cannot replicate it at all which is probably one of the most
frustrating aspects of troubleshooting this.

I have gotten the pcoip_server logs from the VDI desktop I am logged into and have attached them here. I have had to remove/replace some info that was specific to my org, (FQDN and IPs) but nothing else was redacted.

Any help anyone can offer would be appreciated.

18 Replies
nimzobob
Contributor
Contributor

I am seeing the same thing with the same version

Found this in one of the logs - none of the other logs show anything as far as I can tell.

vmware-crtbora-9076.log

2022-03-10T10:11:59.464Z In(05) crtbora crt::common::MKS::GetConnectionStateReason(): remote mks disconnect reason code is 29.
2022-03-10T10:11:59.464Z In(05) crtbora crt::common::MKS::SetConnectionState: MKS connection state changes from 2 to 1.
2022-03-10T10:11:59.464Z In(05) crtbora crt::common::MKS::GetConnectionStateReason(): remote mks disconnect reason code is 29.
2022-03-10T10:11:59.469Z In(05) crtbora crt::common::MKS::OnConnectionStateChanged: remote mks set disconnect reason 29, so attempting to reconnect with retry count = 1 and duration = 2 sec.
2022-03-10T10:11:59.484Z In(05) crtbora crt::win32::MainMKSWindow::SetLockedDPI: Customized DPI :0 is set.
2022-03-10T10:11:59.485Z In(05) crtbora crt::win32::MainMKSWindow::SetLockedDPI: Customized DPI :0 is set.
2022-03-10T10:12:04.567Z In(05) crtbora crt::common::MKS::GetConnectionStateReason(): remote mks disconnect reason code is 29.
2022-03-10T10:12:04.567Z In(05) crtbora crt::common::MKS::SetConnectionState: MKS connection state changes from 1 to 1.
2022-03-10T10:12:04.567Z In(05) crtbora crt::common::MKS::OnConnectionStateChanged: remote mks set disconnect reason 29, so attempting to reconnect with retry count = 2 and duration = 4 sec.
2022-03-10T10:12:11.674Z In(05) crtbora crt::common::MKS::GetConnectionStateReason(): remote mks disconnect reason code is 29.
2022-03-10T10:12:11.674Z In(05) crtbora crt::common::MKS::SetConnectionState: MKS connection state changes from 1 to 1.
2022-03-10T10:12:11.674Z In(05) crtbora crt::common::MKS::OnConnectionStateChanged: remote mks set disconnect reason 29, so attempting to reconnect with retry count = 3 and duration = 8 sec.
2022-03-10T10:12:17.812Z In(05) crtbora crt::win32::MainMKSWindow::SetLockedDPI: Customized DPI :0 is set.
2022-03-10T10:12:22.772Z In(05) crtbora crt::common::MKS::GetConnectionStateReason(): remote mks disconnect reason code is 29.
2022-03-10T10:12:22.772Z In(05) crtbora crt::common::MKS::SetConnectionState: MKS connection state changes from 1 to 1.
2022-03-10T10:12:22.772Z In(05) crtbora crt::common::MKS::OnConnectionStateChanged: remote mks set disconnect reason 29, so attempting to reconnect with retry count = 4 and duration = 8 sec.
2022-03-10T10:12:32.795Z In(05) crtbora crt::win32::MainMKSWindow::SetLockedDPI: Customized DPI :0 is set.
2022-03-10T10:12:33.878Z In(05) crtbora crt::common::MKS::GetConnectionStateReason(): remote mks disconnect reason code is 29.
2022-03-10T10:12:33.878Z In(05) crtbora crt::common::MKS::SetConnectionState: MKS connection state changes from 1 to 1.
2022-03-10T10:12:33.878Z In(05) crtbora crt::common::MKS::OnConnectionStateChanged: remote mks set disconnect reason 29, so attempting to reconnect with retry count = 5 and duration = 8 sec.
2022-03-10T10:12:37.838Z In(05) crtbora crt::win32::MainMKSWindow::SetLockedDPI: Customized DPI :0 is set.

Reply
0 Kudos
kvmw2130
VMware Employee
VMware Employee

The logs are of the 1st of Feb I see there were network drops and the ping timer expired causing the disconnect:

Ping Timer Expiry

Line 886: 2022-02-01T07:38:58.590-05:00> LVL:1 RC: 0 SERVER :InputDevTap_GetKeyboardState @ timer: LEDs = 0x00 ==> 0x02
Line 1680: 2022-02-01T08:04:51.322-05:00> LVL:2 RC:-500 MGMT_IMG :Imaging Timer expiry.
Line 1696: 2022-02-01T08:05:19.489-05:00> LVL:1 RC:-504 MGMT_PCOIP_DATA :Unable to communicate with peer on PCoIP media channels (data manager ping timer expired)

Network Drops: 

Line 1352: 2022-02-01T07:50:52.085-05:00> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/020468 T=002160/029226/007581 (A/I/O) Loss=0.00%/0.17% (R/T)
Line 1379: 2022-02-01T07:51:52.348-05:00> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/022954 T=002160/032741/008292 (A/I/O) Loss=0.00%/0.09% (R/T)
Line 1414: 2022-02-01T07:52:52.563-05:00> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/025104 T=002160/035998/008817 (A/I/O) Loss=0.00%/0.74% (R/T)

Line 1618: 2022-02-01T08:01:53.461-05:00> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/043349 T=002445/053790/014877 (A/I/O) Loss=0.00%/0.10% (R/T)
Line 1639: 2022-02-01T08:02:53.671-05:00> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/045438 T=002445/058048/015507 (A/I/O) Loss=0.00%/0.06% (R/T)

 

 

 

Reply
0 Kudos
beachITguy
Contributor
Contributor

Thank you for the reply,

Is there anything that can be done to correct this? 
Like I have said this only happens to a few people when there a multiple people connected. and there is no way in which I can replicate the issue. And I have looked at our network configs and does not appear to be anything we can change on the network gear that would correct this.

Reply
0 Kudos
TechMassey
Hot Shot
Hot Shot

Based on the logs, you have two very nice 4k monitors ;). 

Due to Log4J, many companies including my own had to rush to Horizon 2111. First issues we encountered were graphical in nature, typically due to older Horizon 5.x clients. 

I agree that this isn't a networking issue, the PCOIP logs indicate no high RTT latency or packet loss. The behavior though can indicate the VM itself is freezing in vSphere, either due to a large VM CPU spike or constrained vSphere Cluster resources. 

However, I actually faced this exact issue a few months ago. New versions of Horizon and the Horizon client just don't offer any love for multiple 4k monitors. In the logs, you will see multiple entries for "unsupported display types/resolution." Instead, uninstall Horizon Client 2111 and drop in 2103.


Should be smooth going from there unless it is resource constraints in the datacenter. 


Please help out! If you find this post helpful and/or the correct answer. Mark it! It helps recgonize contributions to the VMTN community and well me too 🙂
Reply
0 Kudos
beachITguy
Contributor
Contributor

Thank you for the reply.

I will have a select few repeat offenders  downgrade their client and try to test and will let you know in a few days. 

 

Reply
0 Kudos
beachITguy
Contributor
Contributor

One of my users that downgraded clients just got back to me stating that they were just disconnected.

I got their pcoip server log file and have attached. again I only redacted the FQDN and IP.

Circling back to what you said about the resource constraints how would I go about finding that out? Like I said I do not have access to the VMware server or connection server I would have to relay this information over to them. But they say that the servers are configured properly and it is not their issue, which is why I am trying to track it down.

Also, circling back to what another person told me in a reply above, that there was a ping timer that expired and that is the reason why the session was dropped. Is there a way to increase the timeout? as detailed below

2022-03-24T13:44:25.554-04:00> LVL:1 RC:-504 MGMT_PCOIP_DATA :Unable to communicate with peer on PCoIP media channels (data manager ping timer expired)

is this controlled via settings on the server itself? or network related (Switch, FW, Router)

Reply
0 Kudos
TechMassey
Hot Shot
Hot Shot

That is unfortunate, the vSphere team won't at least provide an exported PNG graph of the cluster or virtual machine. The one item you can do is leverage perfmon for recording basic resource metrics on the VM. 

It is also unfortunate the slightly older 2103 client did not help. The issue impacted both PCOIP and Blast in my recent experience. 

On the timeout feature, I'm not familiar but as an alternate test you should be able to try a non-teradici device if allowed on a company workstation/laptop with the Horizon Client/HTML Access. 

One last item, there are valuable logs located both on the connection server and VDI desktop. They are specified in this link and are great for correlating timestamps in the client logs. 

VMware Horizon Client Log Locations - Location of Horizon (VDM) log files (1027744) (vmware.com)

 

One final item, I'm investigating additional symptoms around this issue occurring in the last 24 hours. It may be the same issue or a new variation, I'll post back here. 


Please help out! If you find this post helpful and/or the correct answer. Mark it! It helps recgonize contributions to the VMTN community and well me too 🙂
Reply
0 Kudos
beachITguy
Contributor
Contributor

Unfortunately, we are not able to  try a non-teradici device in the org.

I was able to get the debug logs from the connection server though and I have attached here. again I have redacted the IP, FQDN nothing else has been modified

I want to thank you for the help you have given me. and look forward to trying to figure this out. If there is anything else I can do or try I am open to suggestions.

Reply
0 Kudos
IboIboIbo
Contributor
Contributor

We have similar issue, did you figure out your issue or solution?

Reply
0 Kudos
skocatt
Contributor
Contributor

We do have identical issue in our environment and so far even VMware could not find the root cause.

Have you guys found something? Anything we could try?

Thanks

Reply
0 Kudos
skocatt
Contributor
Contributor

Guys, did anyone of you found any solution?

Reply
0 Kudos
jmacdaddy
Enthusiast
Enthusiast

Any chance that DRS is vMotioning the desktops and the stun duration is long enough to cause disconnect?  I have seen this in a number of my Horizon deployments.  You should be able to check the VM's logs in vCenter and see if a migration is occurring at the same time the user is reporting the disconnection.  

Reply
0 Kudos
mbrundage
Contributor
Contributor

Did anyone figure out a resolution for this?

Reply
0 Kudos
adamabel
Enthusiast
Enthusiast

I have been having a similar issue with Horizon server 8.0 when clients were using client 8.0 and the web client. 

I upgraded to 8.4 server about a month ago and clients are using 8.0 in some cases and 8.8 in others for the full client and some are on the web client.  It doesn't seem to matter what client or server I am running.  Also the issue is very intermittent and seems to only effect a small sub set of users.  At the moment since 99% of my users connect from remote locations across the US and Canada I'm incline to point the finger at there ISP or issues on the internet. 

Today a client on one VM has reported disconnects by about 8 different users.  No other clients have reports issues. 

Reviewing the machine there have been some CPU spikes. 

There are about 20 users connected and only 2-5 loss connection at a time the others stay connected. 

there is no DRS/Vmotion happening on the machine either. 

Reply
0 Kudos
NLAVOIE
Contributor
Contributor

i have the same problem here. We use blast. Users are disconnected after about 30 minutes approx. I don't know where to watch to find the problem.

Thanks.

Reply
0 Kudos
Wintersblack666
Contributor
Contributor

My environment is still running Horizon version 7.10 (working towards an upgrade) and we are experiencing the same issue. However users are reporting their screens turning black for a few seconds then reconnecting back into their session. Same issue with a slight pause before the disconnect occurs at random times some days but all users seem to experience it simultaneously from what we have been monitoring.

It is only affecting users connecting from the internet. A difference being we have provided all remote users with full Windows PC's with Horzion client installed for access to the VDI.  We verified all users have at least 25mbps internet connections and all are using the blast protocol.  I was also inclined to blame ISP's, but now I am not so sure.   

Our issue started around the October 2022 time frame and have had VMWare support involved as well.   

Has anyone had relief and can you share what was done? 

Reply
0 Kudos
manishbhatia26
Contributor
Contributor

did anyone find fix on this issue ? We are on Horizon8(2111.1) and also getting reports that users are getting disconnect randomly. 

Debug logs capturing event of VDI disconnect as below 

023-08-22T14:16:00.042-04:00 INFO (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] mmfw_PipeRead: called (client @ 23433E1EF80)
2023-08-22T14:16:00.042-04:00 INFO (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] mmfw_PipeRead: connection closed, status=mmfw_Status_EOF,
2023-08-22T14:16:00.042-04:00 INFO (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] Service client 0000023433E1EF80 dead, release client info
2023-08-22T14:16:00.042-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] Disconnect desktop with name = cn=107df0e6-127d-4b18-8da5-cdcf152c6daa,ou=entitlements,dc=vdiglobal,dc=vmware,dc=int@2204@3596
2023-08-22T14:16:00.042-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] Abort get reconnect ticket on close
2023-08-22T14:16:00.042-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] UsbDisconnectDesktopDevices: desktop=cn=107df0e6-127d-4b18-8da5-cdcf152c6daa,ou=entitlements,dc=vdiglobal,dc=vmware,dc=int@2204@3596, deviceCount=0, final=1
2023-08-22T14:16:00.047-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] Removed desktop with name = cn=107df0e6-127d-4b18-8da5-cdcf152c6daa,ou=entitlements,dc=vdiglobal,dc=vmware,dc=int@2204@3596
2023-08-22T14:16:00.047-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] Removed client with context = 2204
2023-08-22T14:16:00.047-04:00 INFO (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] mmfw_CloseConnectionOnServer: Try to close pipe E40
2023-08-22T14:16:00.070-04:00 DEBUG (1850-1E98) <SharedMemReaderThread> [MessageFrameWork] SharedMem reader opt, reader 0x000001884081E760 for channel 0x00000188407CEEA0 detaching from thread 0x0000018840810FB0 handleCount 5
2023-08-22T14:16:00.070-04:00 DEBUG (1850-1E98) <SharedMemReaderThread> [MessageFrameWork] SharedMem reader opt, peer is dead - reader 0x000001884081E760 in thread 0x0000018840810FB0
2023-08-22T14:16:00.070-04:00 DEBUG (1850-1E98) <SharedMemReaderThread> [MessageFrameWork] MessageFrameWork Worker Shutdown OnChannelDelete, Name=VmwHorizonClientUI2204
2023-08-22T14:16:00.070-04:00 DEBUG (1850-1E98) <SharedMemReaderThread> [MessageFrameWork] CORE::AuthChannelInt::~AuthChannelInt(): Closed incoming SharedMemory channel from machine LTUS172816.cts.com, user CTS\156070, channel 00000188407CEEA0
2023-08-22T14:16:00.073-04:00 DEBUG (1850-1E98) <SharedMemReaderThread> [MessageFrameWork] CORE::MessageChannel::~MessageChannel(): Channel Horizon Client UI (0x00000188407CEEA0): DELETED
2023-08-22T14:16:00.135-04:00 DEBUG (0E0C-483C) <18492> [vmware-remotemks] Posting Shutdown message to the system queue.
2023-08-22T14:16:00.135-04:00 DEBUG (0E0C-35F8) <MessageFrameWorkDispatch> [vmware-remotemks] System::Shutdown
2023-08-22T14:16:00.135-04:00 INFO (0E0C-35F8) <MessageFrameWorkDispatch> [vmware-remotemks] vmware-view-usbd received shutdown signal
2023-08-22T14:16:00.136-04:00 DEBUG (0E0C-35F8) <MessageFrameWorkDispatch> [vmware-remotemks] viewusb_op_notif_ceipdata: there is no valid service client.
2023-08-22T14:16:00.136-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] Poll ExitCallback called
2023-08-22T14:16:00.136-04:00 INFO (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] out poll loop
2023-08-22T14:16:00.136-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] cdk::usb::UsbDeviceManager::UsbDisconnectAllDevices called.
2023-08-22T14:16:00.137-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] MessageFrameWork Worker Shutdown, Name=UsbDeviceManager, Channel=0000000000000000
2023-08-22T14:16:00.137-04:00 DEBUG (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] coregate 'KeyVault' deleted
2023-08-22T14:16:00.174-04:00 INFO (0E0C-2BD8) <vmware-usbd> [vmware-remotemks] Exiting.

 

Reply
0 Kudos
mbrundage
Contributor
Contributor

We are still facing this issue on 2212, running view client 2303. Some of out clients we can attribute to a memory utilization issue on the thinclient and adding more ram has bought us some time. We had to look the PID table and system logs to see that the view client or a component of it was being killed by OOM on these devices. What attributes to the high usage is still to be determined. We even tried to run with no client re-directions and all plugins disabled, but it still happens.