Hello-
We have one particular client that is experiencing a large number of partial session disconnects. By this I mean that the screens go black, sometimes flash a bit, and then come back up. The session appears to be in the same state that it was prior to the "blackout" - as the client has termed it. It does not matter if the client user is sitting idle, typing away in Word, Outlook or some business application. The "blackouts" seem totally random.
The configuration looks like this:
Note that we have many other clients using the same exact end points connected to the same exact infrastructure across AT&T metroE lines that are terminated by Cisco 28xx or 29xx routers. We do not have this issue anywhere except with this one particular client.
Here is what we've done so far:
This is a snippet from a log file from a DVM that is experiencing blackouts....note that things appear to be chugging along just fine, and then there is an entry that says "MGMT_IMG :Imaging Timer expiry." This is followed by "MGMT_IMG :Resetting encoder (reset_type=1)" and then "IMG_FRONTEND :close_displays: reset SVGADevTap." I'm not a PCoIP expert, but things like Resetting Encoder and Close_Displays sure seem bad. Surprisingly, there are no noticeable bandwidth issues or packet loss logged prior to the blackout.
Here is the complete log snippet:
06/09/2011, 10:42:44.631> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: bw limit = 2500, plateau = 2500.0, avg tx = 14.1, avg rx = 7.8 (KBytes/s)
06/09/2011, 10:42:44.631> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/331763 T=009795/247648/098768 (A/I/O) Loss=0.00%/0.00% (R/T)
06/09/2011, 10:42:45.740> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: round trip time (ms) = 4, variance = 1, rto = 105
06/09/2011, 10:43:11.334> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: round trip time (ms) = 4, variance = 1, rto = 105
06/09/2011, 10:43:12.599> LVL:2 RC: 0 MGMT_IMG :log: cur_s 0 max_s 30 tbl 0 bwc 0.00 bwt 17.58 fps 0.27 fl_ps 1.57
06/09/2011, 10:43:12.599> LVL:2 RC: 0 MGMT_IMG :log: chg pix: 792768, chg pix not motion: 792768
06/09/2011, 10:43:12.599> LVL:2 RC: 0 MGMT_IMG :log: delta bits encoded: 356216, delta build bits encoded: 55864.
06/09/2011, 10:43:12.599> LVL:2 RC: 0 MGMT_IMG :log: enc bits/pixel - 0.45, enc bits/sec - 11869.54, enc MPix/sec - 0.03, decode rate est (MBit/sec) - 0.00
06/09/2011, 10:43:33.302> LVL:2 RC:-500 MGMT_IMG :Imaging Timer expiry.
06/09/2011, 10:43:33.302> LVL:2 RC: 0 MGMT_IMG :Resetting encoder (reset_type=1)
06/09/2011, 10:43:34.302> LVL:0 RC: 0 IMG_FRONTEND :close_displays: reset SVGADevTap.
06/09/2011, 10:43:34.302> LVL:2 RC: 0 MGMT_IMG :CODEC: State change from CODEC_RUNNING to CODEC_DISABLED
06/09/2011, 10:43:34.302> LVL:0 RC: 0 IMG_FRONTEND :Calling open display in Tera1 mode.
06/09/2011, 10:43:34.302> LVL:0 RC: 0 IMG_FRONTEND :configure_displays: 2 display(s) initially reported!
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_display[0]--* id: 5 mon_id: 0 pos: (0,0) w: 1280 h: 960
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_display[0]--bpp: 32 --pitch: 5120 --map size: 2416 --fb size: 4915200
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_displays[0]--rot: 0 --pre-rot pitch: 5120 --frontend motion is enabled
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_display[1]--* id: 6 mon_id: 1 pos: (1280,0) w: 1280 h: 960
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_display[1]--bpp: 32 --pitch: 5120 --map size: 2416 --fb size: 4915200
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_displays[1]--rot: 0 --pre-rot pitch: 5120 --frontend motion is enabled
06/09/2011, 10:43:34.349> LVL:0 RC: 0 MGMT_IMG :Image Engine detected display #0 (1280x960) with offset (0x0)
06/09/2011, 10:43:34.349> LVL:0 RC: 0 MGMT_IMG :Image Engine detected display #1 (1280x960) with offset (1280x0)
06/09/2011, 10:43:34.349> LVL:2 RC: 0 MGMT_IMG :CODEC: State change from CODEC_DISABLED to CODEC_DMT_EXCHANGE
06/09/2011, 10:43:34.709> LVL:2 RC: 0 MGMT_IMG :CODEC: Processing MGMT_IMG_APDU_TYPE_DMT_ACK. [pri = 0]
06/09/2011, 10:43:34.709> LVL:2 RC: 0 IPC :Allocated 284 slice ref descriptors
06/09/2011, 10:43:34.709> LVL:0 RC: 0 IPC :cSW_HOST_IPC: New sub-session ID is 3
06/09/2011, 10:43:34.709> LVL:2 RC: 0 MGMT_IMG :CODEC: State change from CODEC_DMT_EXCHANGE to CODEC_RUNNING
06/09/2011, 10:43:34.724> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=0 refvld=0 seq=0
06/09/2011, 10:43:34.724> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=1 refvld=0 seq=0
06/09/2011, 10:43:34.740> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=2 refvld=0 seq=0
06/09/2011, 10:43:34.740> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=3 refvld=0 seq=0
06/09/2011, 10:43:34.756> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 0 seq 0 (ref)
06/09/2011, 10:43:34.756> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 1 seq 0 (ref)
06/09/2011, 10:43:34.756> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=4 refvld=0 seq=0
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 2 seq 0 (ref)
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 3 seq 0 (ref)
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=5 refvld=0 seq=0
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=6 refvld=0 seq=0
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=7 refvld=0 seq=0
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=8 refvld=0 seq=0
06/09/2011, 10:43:34.787> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 4 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 5 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=9 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 6 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=10 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=11 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=12 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 7 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 8 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=13 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=14 refvld=0 seq=0
06/09/2011, 10:43:34.818> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 9 seq 0 (ref)
06/09/2011, 10:43:34.834> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=0 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 10 seq 0 (ref)
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 11 seq 0 (ref)
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=1 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=2 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=3 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=4 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=5 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=6 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=7 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=8 refvld=0 seq=0
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 12 seq 0 (ref)
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 13 seq 0 (ref)
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 14 seq 0 (ref)
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=9 refvld=0 seq=0
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=10 refvld=0 seq=0
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 0 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 1 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=11 refvld=0 seq=0
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 2 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 3 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 4 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 5 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=12 refvld=0 seq=0
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 6 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=13 refvld=0 seq=0
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 7 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 8 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=14 refvld=0 seq=0
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 9 seq 0 (ref)
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 10 seq 0 (ref)
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 11 seq 0 (ref)
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 12 seq 0 (ref)
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 13 seq 0 (ref)
06/09/2011, 10:43:34.959> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 14 seq 0 (ref)
06/09/2011, 10:43:36.865> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: round trip time (ms) = 4, variance = 1, rto = 105
06/09/2011, 10:43:42.599> LVL:2 RC: 0 MGMT_IMG :log: cur_s 1 max_s 30 tbl 0 bwc 1.01 bwt 17.58 fps 1.43 fl_ps 3.33
06/09/2011, 10:43:42.599> LVL:2 RC: 0 MGMT_IMG :log: chg pix: 5854592, chg pix not motion: 5854592
06/09/2011, 10:43:42.599> LVL:2 RC: 0 MGMT_IMG :log: delta bits encoded: 8660416, delta build bits encoded: 1015832.
06/09/2011, 10:43:42.599> LVL:2 RC: 0 MGMT_IMG :log: enc bits/pixel - 1.48, enc bits/sec - 288531.76, enc MPix/sec - 0.20, decode rate est (MBit/sec) - 0.00
06/09/2011, 10:43:47.943> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: bw limit = 2500, plateau = 2500.0, avg tx = 110.5, avg rx = 4.8 (KBytes/s)
06/09/2011, 10:43:47.943> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/334601 T=009795/249542/099574 (A/I/O) Loss=0.00%/0.00% (R/T)
So once the displays resync, there are a slew of recodes (to be expected) and then things go back to moving along just fine with a 0% packet loss and normal responsiveness reported by the user.
VMware tech support seems to be at a complete loss.
Does ANYONE have any ideas?
Rick
You mentioned you have other clients using the same end points and same infrastructure, but I assume since they are different clients they are likely using different images, but I could be wrong. Have you ensured there are no issues with the image, such as VMware Tools being installed AFTER the View Agent, or some kind of Antivirus running that is causing issues?
Earl-
Thanks for the response. Yes, we have tried the agent and tools uninstall/reinstall dance a few times. As to the antivirus, this particular client is using Sophos and we have several other clients using the same.
We are literally pulling our hair out on this one...
It just seems as though the endpoint randomly decides to "rediscover" the screens.
Rick
It seems you are using a zero client. Have you tried with another client to see if you can rule that out?
We have tracked several similar cases. I know of four that were closed a month ago and the root cause in each case was a faulty network switch or GBIC. In each case large amounts of errors on network ports was a key hint.
Just trying to give you options to explore.
WP
Warren, we have tried both soft and hard clients, and have used a variety of Teradici firmware including 3.2, 3.3, 3.3.1, and now 3.4. The problem seems to manifest regardless.
There are no errors on switch ports and the router interfaces are clean as a whistle.
As I mentioned we did some MTR testing over the course of several days, during which 30-40 session disconnects occurred, and we had a total packet loss of 0.0019% which is 50x less than the level we are told that PCoIP begins to notice things. (0.1%)
The case has recently been escalated within VMware Tech Support and an engineer asked if we were using CEF (Cisco Express Forwarding). We are. I looked for any references to CEF + PCoIP being a bad combination but could not find any. We have not yet tested without CEF but that will be the first step next week when the client returns to work.
Rick
I know it's been a while, but did you ever find a cause or resolution for this issue? We have a customer with exactly the same issue now. If you have any tips to start us looking in the right direction, I'd really appreciate it.
Thanks,
Dave
This is what Teradici says about CEF:
Can I use Cisco Express Forwarding per-packet load sharing with PCoIP Protocol? (15134-469)
Answer:
No, Cisco Express Forwarding (CEF) per-packet load sharing causes out-of-order packets which is not compatible with real-time protocols such as PCoIP Protocol. Cisco has similar recommendations against CEF per-packet load sharing when using Voice-over-IP and Video-over-IP applications.
(CEF) can perform per-packet load sharing which almost always results in out-of-order packets which can severely impact applications that are sensitive to out-of-order packet delivery such as real-time applications including VOIP, video streaming and PCoIP protocol.
Resolution:
As an alternative, consider using per-destination (source/destination IP address pair) load sharing.
Gunnar,
That explanation applies only when CEF is actually being used to load balance across multiple links. A more common use of CEF, for us anyway, is to enable it on edge routers which are connected to a Metro-E network. In this case there is only a single path, so no load balancing takes place.
We use CEF to force the routers to use fast-switching which occurs down in the ASIC rather than process switching which consumes CPU time. For example, a 20mbps Metro-E line will easily swamp most 2800 series routers if the CPU is forced to get involved in every packet routing decision - this is especially true for protocols like PCoIP which blast out streams of tiny packets. Using CEF can mean the difference between 95-100% CPU and 3-4% CPU on the router - at least in the configuration I've described above.
Rick
Dave-
This was prior to View 5 shipping, but after weeks of pulling our hair out, Teradici released to us a special build of the View Agent software. It immediately resolved our issues. There were apparently a handful of others with similar issues and they were able to isolate the issue.
From what I have been told, the changes in this special patch (not released publicly) were incorporated into View 5.
Rick
I have a specific regional site of 30 users that are also experiecing this exact issue!
the log files show exactly the same info.
we have view 5.1.1 and are using NC221 thinclients. ( i have also tested the 5.3 agent)
No other regional site is effected, the same base image is in use across the board.
Was CEF the issue did you disable this? or was there anything else?
This has been going on since implimetnation to VDI 12 weeks ago.
any help would be great.
Bit of a long shot, but also look at your client firewall. I had the same issue with 4.6 whereby the client screens would blink for a second or so. The culprit turned out to be f-secure client security dropping some of the PCoIP packets.
**Problem is now resolved for us**
We found the following in the Pcoip server logs and found many many out of order packets of UDP/pcoip traffic.:
IPC :Received ACK on fsp 6 for unknown flight seq 62
IPC :Received ACK on fsp 11 for unknown flight seq 58
We then switched the Local sites switch stack (Master) switch to Slave and allowed 1 of the other switches to become (Master)
And immediately there was no "out of order" packets and the Pcoip log files did not show the "unknown flight" errors i pasted above anymore.
As a result we replaced the origonal master switch and have a very happy office.
hope this helps someone else.
Regards
Question for you. Did you eventually get this resolved? We are using TERA2140 4x DVI Port Client (Buffalo) Zero Client from IO Corp and the person using this is having the disconnect issues and screen that goes black. Did you have to remove CEF and have to do packet forwarding a different way? What finally worked for you?