I'm surprised no one else is experiencing these issues. We now have a load balancer in place and we're still having problems. I'm guessing this has to do with how the pods are configured, although we had tickets opened to validate our environment and everything is configured the way It should be.
The connection servers and CPA are configured like this:
Connection server 1 - primary on Pod 1
Connection server 2 - primary on Pod 2
Connection server 3 - replica of Connection server 1 on Pod 1
Connection server 4 - replica of Connection server 2 on Pod 2
We have exactly the same pools on each environment in Pod 1 and 2.
"Hospital x", "Hospital y" etc. and they have the same amount of virtual machines available, all configured in non-persistent linked-clones.
We have a pod federation with 2 pods, "Pod 1" and "Pod 2". Both pods are part of the same site, "Hospitals".
The global entitlements for each pools are configured like this:
Connection server restrictions: None
Scope: All sites
User assignement: Floating
Use Home site: Disabled
Automatically clean up redundant sessions: Enabled
Allow users to reset/restart their machines: Disabled
Allow user to initiate seperate sessions from different client devices: No
Client Restrictions: Disabled.
The main issue is that when a user tries to reconnect to his disconnected session, some times the user ends up starting a new session on the other pod, instead of connecting to the one he had in the first place. So he ends up with one session on each pod. Are the connections servers from pod 1 and 2 talking to each other at all or are they just starting up connections to virtual machines withing their pod?
I have some CPA experience but we are still PoC it so I haven't seen enough sessions to experience the issue you are describing.
When you setup the environment did you originally just have 1 connection server per pod and then add the second connection servers in each pod after cloud pod was initialized? There are manual steps to add additional connection servers when CPA is configured.
Connection servers within a pod replicate all of their information through the ADLDS (ADAM) database (VMwareVDMDS). When you enable CPA a second global ADLDS (ADAM) database (VMwareVDMDSG) is created that replicates between all of the connection servers in all of the connected pods. This replicates the global entitlement and session information.
Yes we did add the replicas later on. We initially started with 1 connection server on each pod, when we were running on version 7.2. When we upgraded to 7.3.2, we had to undo the CPA and redo it after the upgrade. But I don't remember if we added or not the replicas before or after we redid the CPA.
From what I understand in the KB you posted, I should add the Windows account of the replicated instance as a member of "CN=Administrators,CN=Roles" on the primary connection server?
I looked on the 2 primary connection servers we have and didn't see the Windows accounts of our replicas as a member.
1 person found this helpful
Check the list of installed programs on the servers. Do you have both the VMwareVDMDS and VMwareVDMDSG components installed on all of the connection servers? If not that could explain your issue since not all of the connection servers are aware of the global session status.
Yes both VmwareVDMDS and VmwareVDMDSG are installed on all connection servers.
Also, just to clarify, we had this issue even before we added the replicas. We had this issue when we were running only 2 primary connections servers, one on each pod.
Here's the result of the "vdmadmin -X -lpinfo" command:
I think at this point it would be best to work with VMware support. Just keep pushing for an escalation if they can't figure it out.
Thank you for your help BenFB . We still haven't figured out where this issue is emanating from and it's a bit frustrating, considering we opened multiple tickets in the last 2 years with VMware as well as working with our TAM in the beginning of this project. I just wish we had a lab to test and try different things with CPA. If I find anything else on this, I'll make sure to post it here.
What's the connection like between the two pods (Speed, distance and latency)?
Is there a firewall between the pods with logging that you can review to validate if any ports are being blocked? We found a few ports that are needed for CPA that are not documented.
The two pods are connected with 10GB FC in the same datacenter.We're going to physically move one of the pods eventually to another datacenter. No firewall between the pods at the moment.
At this point I think VMware may be your best resource. Do you have a VMware TAM or EUC SE? I'd be talking to them to make sure your SR are being prioritized and what the expected behavior vs. your experience is (e.g. Should CPA evenly distribute sessions across pods or is what you are seeing expected?).
we've got the same issue in our environment.
We have the following configuration:
One global Entitlement
Site1 = 2 Pods,
Site2 = 2 Pods
With one Global Entitlement. The balance between the sites is 25% - 75%.
Did you have a solution meanwhile from VMWare Support?
We have the same issue but inside of site.
4 PODs between 2 sites. F5 LB in front of the sites equally load balancing between sites and PODs.
POD1 - 150 sessions
POD2 - 100 sessions
POD3 - 125 sessions
POD4 - 125 sessions
My fear is as we onboard 2000 seats this gets really skewed and my clusters run hot.
Open a ticket if you haven't. Going from memory the f5 monitors use least connection, maybe there where 150 on both, then 50 people disconnected. The other thing is the monitors could be timing out barely, just enough have the f5 direct traffic to just one of the pods.
Just to give an update concerning this issue. We haven't figured out with VMware support what the issue was.
Since then we stopped using Appvolumes, we updated UEM to DEM 9.9, but still running Horizon 7.3.2. We tested with a load balancer and we were still seeing the same behavior.
We destroyed the CPA and merged our clusters into one environment and we've been running the environment this way for the last year and a half.
We're looking into updating our infrastructure this year with vSAN-ready servers and also using CPA. If these issues are still happening with the latest Horizon versions, I don't know what we're going to do.
Which version of Horizon are you using in your environment?