VMware Horizon Community
ZyBeR
Contributor
Contributor

PCoIP clients disconnections during high cpu load

I'm having issues with my View environment, we have 80 clients running Zero clients with PCoIP.

View 5.1 and ESXi 4.1.

What happens is the desktop flickers and goes black and disconnects the session, this happens ~10 times per day (over 80 clients) and I think it always happens when the vCPU is maxed out. At least everything I look the vCPU have been at 100% or close to.

I've been trying to find a solution, but I can't really find anything.

I just extracted logs from on of the cdesktops after it disconnected.

Here are some of the info I found interesting.

09:05:27,896 ERROR <theTopicPublishingManager> [TopicPublishingManager] TopicPublishingManager general exception com.vmware.vdi.agent.messageserver.TopicPublishingManager.run(SourceFile:273)
com.swiftmq.jms.ConnectionLostException: Connection closed
at com.swiftmq.jms.ExceptionConverter.convert(Unknown Source)
at com.swiftmq.jms.v750.MessageProducerImpl.processSend(Unknown Source)
at com.swiftmq.jms.v750.MessageProducerImpl.send(Unknown Source)
at com.swiftmq.jms.v750.TopicPublisherImpl.publish(Unknown Source)
at com.vmware.vdi.agent.messageserver.TopicPublishingManager.run(SourceFile:265)
at java.lang.Thread.run(Thread.java:619)
09:07:08,961 INFO  <Main Thread> [wsnm_desktop] Session DISCONNECTED: id=0, user EB\mikael.berglin, client=(null), connectionId=BB454829974AFBE64828632856252914, userDn=cn=s-1-5-21-1277182115-1003060436-3752233372-1146,cn=foreignsecurityprincipals,dc=vdi,dc=vmware,dc=int
09:07:11,665 INFO  <5608> [winlogon] Program 'winlogon' started, version=, pid=5084, buildtype=release, usethread=0, closeafterwrite=1
09:07:12,045 INFO  <5608> [winlogon] Authentication Manager successfully installed.
02/26/2013, 09:05:44.499> LVL:2 RC:   0           AGENT :MBX_CON_CLOSED
02/26/2013, 09:05:44.499> LVL:2 RC:   0           AGENT :tera_agent_disconnect: agent close code: 3, disconnect reason: 0
02/26/2013, 09:05:45.200> LVL:2 RC:   0           AGENT :tera_agent_disconnect: connection_closed 3
A complete set of logs from the vm is attached, it' conly logs from today so they should all be relevant.

If anyone could give me some pointers it would be super helpful.

0 Kudos
5 Replies
mittim12
Immortal
Immortal

It might be best to have someone from Teradici or VMware look at the logs but below is what stands out to me.

TERA_MGMT_SYS_SESS_EVENT_CLOSED with disconnect cause (0x400) which translates to  Zero client and View 4.5 and earlier: PCoIP session terminated due to network issues on TCP/SSL connection (keepalive ping timeout).   You can view disconnect codes at this KB http://kb.vmware.com/kb/2012101

Also noticed a few lines similar to this GMT_PCOIP_DATA :Tx thread info: bw limit = 608, plateau = 515.9, avg tx = 0.9, avg rx = 0.8 (KBytes/s)
02/26/2013, 09:05:22.808> LVL:1 RC:   0           VGMAC :Stat frms: R=000000/000000/160005  T=004999/131027/059559 (A/I/O) Loss=0.00%/0.00% (R/T)
02/26/2013, 09:05:24.200> LVL:2 RC:   0 MGMT_PCOIP_DATA :Tx thread info: round trip time (ms) = 4384, variance = 8006, rto = 12490

Is this connection via the WAN or LAN?    How long is the max CPU usage? 

0 Kudos
ZyBeR
Contributor
Contributor

It's on the LAN and CPU spikes can continiue for some time, up to a few minutes for stupid vlookup in Excel or similar cpu intensive operations.

But it seams like the high load is disturbing the View Agent somehow, looking at the event log I see several SMB disconnect events occuring at the same time as these problems.

We are not using SSL btw.

0 Kudos
jslabaugh
Enthusiast
Enthusiast

I don't yet have a solution, but I am seeing the same problem with the same errors in my logs, View 5.1.2, both 4.6 and 5.1.1 agents, Windows XP, ESXi 5, on a LAN. My users are reporting that it occurs when they have several applications open at once.  The relevant part of my logs is below:

02/28/2013, 09:34:45.406> LVL:1 RC:   0 MGMT_PCOIP_DATA :BW: Decrease (overrun) old = 289.2762 new =  85.4560, old plat = 201.7952, current=135.144, avg=109
02/28/2013, 09:34:45.500> LVL:1 RC:   0 MGMT_PCOIP_DATA :BW: Decrease (loss) old =  85.4560 new =  73.2785
02/28/2013, 09:34:45.500> LVL:2 RC:   0        MGMT_IMG :rcv nak  2 seq_id 172 disp 0 fsp  4 f_ref 214
02/28/2013, 09:34:45.500> LVL:2 RC:   0        MGMT_IMG :SW_HOST_IPC: NAK for fsp 4 seq 214. (ref id=37) Ack ref available, recode from input
02/28/2013, 09:34:45.531> LVL:2 RC:   0        MGMT_IMG :SW_HOST_IPC: IPLP: recode from input: fsp=4 refvld=1 seq=216
02/28/2013, 09:34:46.125> LVL:1 RC:   0 MGMT_PCOIP_DATA :BW: Decrease (loss) old = 112.5000 new =  96.4688
02/28/2013, 09:34:46.125> LVL:2 RC:   0        MGMT_IMG :rcv nak  2 seq_id 185 disp 0 fsp  4 f_ref 215
02/28/2013, 09:34:46.125> LVL:2 RC:   0        MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 4 seq 216 (ref id=88)
02/28/2013, 09:34:49.828> LVL:2 RC:   0       MGMT_SSIG :Session closed remotely by peer (192.168.7.69, 00-80-64-86-3C-7D, PRI: 0)
02/28/2013, 09:34:49.828> LVL:2 RC:   0       MGMT_SSIG :Received from peer: BYE disconnect reason cause (0x400)
02/28/2013, 09:34:49.828> LVL:2 RC:   0       MGMT_SSIG :Session closure reason cause (device:internal: SSIG keepalive failure)
02/28/2013, 09:34:49.937> LVL:1 RC:   0 MGMT_PCOIP_DATA :BW: Decrease (loss) old = 114.4877 new = 182.6912
02/28/2013, 09:34:49.937> LVL:2 RC:   0        MGMT_IMG :rcv nak  2 seq_id 247 disp 0 fsp 10 f_ref  52
02/28/2013, 09:34:49.937> LVL:2 RC:   0        MGMT_IMG :SW_HOST_IPC: NAK for fsp 10 seq 52. (ref id=2) Ack ref available, recode from input
02/28/2013, 09:34:49.937> LVL:2 RC:   0        MGMT_IMG :rcv nak  2 seq_id 248 disp 0 fsp 11 f_ref   2
02/28/2013, 09:34:49.937> LVL:2 RC:   0        MGMT_IMG :SW_HOST_IPC: NAK for fsp 11 seq 2. (ref id=75) Ack ref available, recode from input
02/28/2013, 09:34:49.937> LVL:2 RC:   0        MGMT_IMG :rcv nak  3 seq_id 250 disp 0 fsp 10 f_ref  53
02/28/2013, 09:34:49.937> LVL:2 RC:   0        MGMT_IMG :rcv nak  3 seq_id 251 disp 0 fsp 11 f_ref   3
02/28/2013, 09:34:50.031> LVL:2 RC:   0        MGMT_IMG :SW_HOST_IPC: IPLP: recode from input: fsp=10 refvld=1 seq=62
02/28/2013, 09:34:50.031> LVL:2 RC:   0        MGMT_IMG :SW_HOST_IPC: IPLP: recode from input: fsp=11 refvld=1 seq=13
02/28/2013, 09:34:50.296> LVL:2 RC:   0        MGMT_IMG :Resetting encoder (reset_type=0)
02/28/2013, 09:34:50.296> LVL:2 RC:   0 MGMT_PCOIP_DATA :mgmt_pcoip_data_set_media_activation: URBoIP is used. OHCI & EHCI media channel is turned OFF
02/28/2013, 09:34:50.296> LVL:1 RC:-500           VGMAC :tera_sock_recv() failed - Interrupted system call (10004)!
02/28/2013, 09:34:50.296> LVL:2 RC:   0           VGMAC :PCoIP UDP RX thread exiting
02/28/2013, 09:34:50.296> LVL:1 RC:-500 MGMT_PCOIP_DATA :INIT: Peer has reset our PCoIP connection.  Aborting session ...
02/28/2013, 09:34:50.296> LVL:2 RC:   0      MGMT_VCHAN :app_reset: Session has been dropped. Forcing all channels to close!
02/28/2013, 09:34:50.296> LVL:1 RC:   0        MGMT_DDC :mgmt_ddc_fcc_callback: queuing TERA_MGMT_FCC_EVENT_RESET [pri=0]
02/28/2013, 09:34:51.296> LVL:0 RC:   0    IMG_FRONTEND :close_displays: reset SVGADevTap.
02/28/2013, 09:34:51.296> LVL:2 RC:   0        MGMT_IMG :CODEC: State change from CODEC_RUNNING to CODEC_DISABLED
02/28/2013, 09:34:51.296> LVL:2 RC:   0          COMMON :TERA_PCOIP: SESSION_EVENT=TERA_MGMT_SYS_SESS_EVENT_CLOSED, disconnect cause (0x400)
02/28/2013, 09:34:51.296> LVL:2 RC:   0          SERVER :server main: cb_notify_session_status called 20 with tera_disconnect_cause(0x400)
02/28/2013, 09:34:51.296> LVL:2 RC:   0          SERVER :server main: TERA_MGMT_SYS_SESS_EVENT_CLOSED with disconnect cause (0x400).
02/28/2013, 09:34:51.296> LVL:1 RC:   0          SERVER :map_tera_to_agent_close_code: TERA_DISCONNECT_CAUSE_DEVICE_INTERNAL_SSIG_KEEPALIVE_FAILURE -> PCOIP_AGENT_CLOSE_CODE_NETWORK_INTERRUPTION
02/28/2013, 09:34:51.296> LVL:2 RC:   0          SERVER :connection_closed: code = 3.
02/28/2013, 09:34:51.296> LVL:1 RC:   0           VGMAC :Session stats: Average TX=0.112354 average RX=0.00894231 (Mbps) Loss=1.75%/0.71% (R/T)
02/28/2013, 09:34:51.296> LVL:2 RC:   0        MGMT_SYS :Session closed remotely!
02/28/2013, 09:34:51.296> LVL:2 RC:   0        MGMT_SYS :********************************************************
02/28/2013, 09:34:51.296> LVL:2 RC:   0        MGMT_SYS :********************************************************
02/28/2013, 09:34:51.296> LVL:2 RC:   0        MGMT_SYS :********************************************************
02/28/2013, 09:34:51.296> LVL:2 RC:   0        MGMT_SYS :********************************************************
02/28/2013, 09:34:51.296> LVL:2 RC:   0          SERVER :server mailbox: MBX_SHUTDOWN with agent disconnect reason code (0x0)
02/28/2013, 09:34:51.296> LVL:2 RC:   0          SERVER :server mailbox: Stopping mailbox message loop
02/28/2013, 09:34:51.296> LVL:2 RC:   0          SERVER :server main: exiting
02/28/2013, 09:34:51.296> LVL:2 RC:   0          SERVER :server cleanup: tearing down pcoip with agent disconnect reason code(0) and waiting
02/28/2013, 09:34:51.296> LVL:1 RC:   0          SERVER :map_agent_to_tera: DISCONNECT_GENERIC -> TERA_DISCONNECT_CAUSE_HOST_BROKER_GENERIC
02/28/2013, 09:34:51.296> LVL:2 RC:10038            PERF :tera_perf_monitor_send: Failure during message write (header).
02/28/2013, 09:34:51.296> LVL:1 RC:-500            PERF :tera_perf_unregister: Sending remove_instance failed.
02/28/2013, 09:34:51.296> LVL:0 RC:   0       MGMT_SESS :Tearing down the session
02/28/2013, 09:34:51.296> LVL:2 RC:   0      MGMT_VCHAN :VChanPluginExit: Closing plugin 'VMware_Server'.
02/28/2013, 09:34:51.343> LVL:2 RC:   0      MGMT_VCHAN :VChanPluginExit: Plugin 'VMware_Server' is closed.
02/28/2013, 09:34:51.343> LVL:2 RC:   0      MGMT_VCHAN :VChanPluginExit: Closing plugin 'mksvchanserver'.
02/28/2013, 09:34:51.343> LVL:2 RC:   0      MGMT_VCHAN :VChanPluginExit: Plugin 'mksvchanserver' is closed.
02/28/2013, 09:34:51.343> LVL:1 RC:   0      MGMT_VCHAN :=> Successfully exited all the VChan plugins
02/28/2013, 09:34:51.343> LVL:2 RC:   0          SERVER :server cleanup: detaching inputdevtap
02/28/2013, 09:34:51.343> LVL:2 RC:   0          SERVER :server cleanup: closing mailboxes
02/28/2013, 09:34:51.343> LVL:2 RC:   0          SERVER :server cleanup: freeing argument copy

I will keep you posted as I troubleshoot.

0 Kudos
ZyBeR
Contributor
Contributor

Any updates mate?

0 Kudos
jslabaugh
Enthusiast
Enthusiast

We found out yesterday that the issue with the client I posted the log from was physical networking-related, not CPU.  However, the disconnect during high CPU load issue is still ongoing with a client of mine in a different building, tried upgrading her virtual hardware last night to match my current version but still was able to reproduce the issue this morning.  Any updates from you, Zyber?

0 Kudos