We are in the process of upgrading from 4.5 to 5.1 and we have noticed that PCoIP is not performing as will as it should. Specifically the login times with PCoIP are more than double what they are with RDP going to the same pool with the same user account. Also I have noticed that when logging off with PCoIP the View Manager shows the desktop in a connected state for about a minute after we have been disconnected. Our VM's are Linked Clone XP SP3 with 1vCPU and 1GB of RAM, I am testing from my Windows 7 64-bit laptop on a wired connection. I have attempted uninstalling and reinstalling the Agent and the Tools on the VM and the Client on my laptop with no change. I have also uninstalled the AV on the VM to make sure that isn't an issue. I have verified that the connection server is set up direct connect and not tunneled. I attempted to use the PCoIP Log Viewer tool but it said that the logs on the VM were invalid. Has anyone else had a similar issue or have any suggestions on how to fix? On a side note I had a consultant from VMware Professional Services sitting next to me helping me troubleshoot and he was as stumped as I am.
Can you give the times of the logins to help deterimin if they are similar to what others are seeing?
What is the performance difference between the logon using PCOIP in the 4.5 environment vs the 5.1 enviornment. Here is a good link on troubleshooting PCOIP performance issues, http://myvirtualcloud.net/?p=751. Also check out this link, http://kb.vmware.com/kb/1030695.
So after some further troubleshooting it seems like this problem is related to our physical endpoints somehow. Some of our devices have no issues logging in with PCoIP (actually it's quite quick) but others are brutally slow. This is not related to accounts, permissions, GPO, or any of the other usual suspects. Have a case open with VMware and had them remoted in to our system from a few hours and they were unable to provide resolution. Anyone have any ideas on what to check?
What is the memory footprint of the physical machines that are slow. Is it possible that the client side caching features are impacting performance?
Thought I'd follow up on this in case anyone was interested. What we found was that the setup of the endpoint you use to connect to a virtual desktop with PcoIP matters a whole lot more than it did with RDP. Specifically the network configurations. We don't have it all worked out yet but what we have found in our environment is that we needed to turn off QoS, IPv6, and Link Layer Topology settings on the network adapter of the endpoint. Also we unintalled QoS from the network adapter on the VM. ON a few endpoints we also had to change the NIC binding order to make sure that PCoIP was using the right network adapter all the time. Making these changes we were able to get fairly consistant login times from most of our enpoints but it was still slower than expected. After working with Professional Services and VMware support we checked the box to use a secure PCoIP gateway on our connection servers and we saw a signifigant improvement in login times. They're nto sure why that is the case as that should technically slow things down but it seems to be working fairly well.
Our guy from VMware professional services thinks that because in PCoIP there is a handshake that has to take place between the agent and the client not having the NIC settings be just right could slow things down a lot. If only we could streamline our GPO now things would be just about perfect....
Thanks for sharing your findings Eric.
Additional followup: After changing the network adapter settings on our test machines we were still seeing some poor performance from a good chunk of our endpoints so we continued to dig at this problem. Yesterday we found that on our distributed switch for the virtual desktops the setting under Teaming and Failover for Load Balancing was set to "Route based on IP hash" when we changed it back to the default of "Route based on originating virtual port" the system became much more stable. We are now seeing consistant login times of under a minute from all endpoints internally and externally including the application of group policy. When persona management is enabled those times are cut roughly in half. Apparantly with our Cisco UCS hosts havign multiple vNICs this load balancing setting was causing massive communication issues. Another case of why it is a good idea to leave everything at the default setting unless there is a very specific reason to change it.
Any reason you wouldn't want that set to "Route based on physical NIC load" (I'm assuming you have multiple physical NICs on the vDS.)?
One of the really cool things about the distributed switch it that when it's configured this way it checks the load on each NIC every 30 seconds and automatically rebalances for best throughout. I'm actually surprised this isn't the default when you have more that one physical NIC on a vDS, but it's not.
That's actually a really good suggestion and we're going to take a look at it. We're not sure how that setting will work with our UCS hosts as the hosts don't technically have "physical" NICs, they have vNICs that are bound to one of tw fabric interconnects on the back of the chasis so we're not sure if the chasis is doing something similar to what you are talking about. We are going to reach out to Cisco and clarify and I will let you know how it turns out.
We recieved word back from our Cisco engineer about their opinion on using the "Route based on physical NIC load" setting and posted below is his response. Not sure that it's the end all be all on the matter but we're going to stick with the default setting for now based on the performance we are seeing and Cisco's opinion.
From Cisco Support:
"I had recommended “route based on originating port ID” to you over the phone because of its static, rather than dynamic, nature. A MAC being learned and relearned on two different upstream switch can produce some very undesired results. And it sounds like the based on physical NIC load option could very well see this behavior. One traffic stream might flow over the B side NIC while the next flows over the A side. From the perspective of the upstream switches, the VM MAC is being relearned. That doesn’t sound attractive to me. In the originating port ID scenario, traffic flows from the VM, once powered on, are persistent across one fabric – only moving to the other fabric in the event of a failure."
Thanks for the update Eric. I'm surprised relearning the MAC address would be a cause for concern, but I also appreciate not messing with something that's working!