VMware Horizon Community
Mach6
Contributor
Contributor

Issues in 2 pods Horizon environment

We're currently running a Horizon View 7.3.2 environment using Cloud pod Architecture (2 pods), UEM 9.2.1, AppVolumes 2.13, Imprivata OneSign 5.4 SP2 HF7 and using Teradici LG Tera 2 zero clients (latest firmware 5.5.1) to connect to this environment.

On each pod, we're running 1x vCenter 6.5, 1x SQL Entreprise 2012 AlwaysOn, 2x VCS, 2x AppVolumes Managers, 1x OneSign appliance.

For now we have 4 pools in production, using global entitlements and around 150-200 concurrent sessions. Our goal is to have 1400 concurrent sessions running by the end of the year.

We never had these issues I'm about to describe below when we were running a single environment without CPA.

1) Session balancing: Most sessions in our environment end up on our 2nd pod. We could have 80 sessions on the 2nd pod and 15 sessions on the first one (and some of these sessions are double sessions, another issue). Since we didn't have any load balancers, we bought 2x Kemp LB and we're currently testing this. We hope this will resolve this issue.

2) Double sessions: Even though our environment is configured for a user to only have 1 session per user per pool, a user could end with 2 sessions, one on each pod. The user would log on to a machine, disconnect and come back later on to connect to a new machine on the other pod, while its first session is disconnected on the first pod. Could this be related to not having a load balancer or is this something else? In the last months, we opened tickets with VMware to make sure our configurations with CPA and VCS were configured correctly and apparently they are.

Anyone else experiencing these kind of issues in a CPA environment?

Reply
0 Kudos
17 Replies
Mach6
Contributor
Contributor

I'm surprised no one else is experiencing these issues. We now have a load balancer in place and we're still having problems. I'm guessing this has to do with how the pods are configured, although we had tickets opened to validate our environment and everything is configured the way It should be.

The connection servers and CPA are configured like this:

Connection server 1 - primary on Pod 1

Connection server 2 - primary on Pod 2

Connection server 3 - replica of Connection server 1 on Pod 1

Connection server 4 - replica of Connection server 2 on Pod 2

We have exactly the same pools on each environment in Pod 1 and 2.

"Hospital x", "Hospital y" etc. and they have the same amount of virtual machines available, all configured in non-persistent linked-clones.

We have a pod federation with 2 pods, "Pod 1" and "Pod 2". Both pods are part of the same site, "Hospitals".

The global entitlements for each pools are configured like this:

Connection server restrictions: None

Scope: All sites

User assignement: Floating

Use Home site: Disabled

Automatically clean up redundant sessions: Enabled

Allow users to reset/restart their machines: Disabled

Allow user to initiate seperate sessions from different client devices: No

Client Restrictions: Disabled.

The main issue is that when a user tries to reconnect to his disconnected session, some times the user ends up starting a new session on the other pod, instead of connecting to the one he had in the first place. So he ends up with one session on each pod. Are the connections servers from pod 1 and 2 talking to each other at all or are they just starting up connections to virtual machines withing their pod?

Reply
0 Kudos
BenFB
Virtuoso
Virtuoso

I have some CPA experience but we are still PoC it so I haven't seen enough sessions to experience the issue you are describing.

When you setup the environment did you originally just have 1 connection server per pod and then add the second connection servers in each pod after cloud pod was initialized? There are manual steps to add additional connection servers when CPA is configured.

Setting up the Cloud Pod Architecture feature on a replicated VMware View Connection Server instance...

Connection servers within a pod replicate all of their information through the ADLDS (ADAM) database (VMwareVDMDS). When you enable CPA a second global ADLDS (ADAM) database (VMwareVDMDSG) is created that replicates between all of the connection servers in all of the connected pods. This replicates the global entitlement and session information.

Reply
0 Kudos
Mach6
Contributor
Contributor

Yes we did add the replicas later on. We initially started with 1 connection server on each pod, when we were running on version 7.2. When we upgraded to 7.3.2, we had to undo the CPA and redo it after the upgrade. But I don't remember if we added or not the replicas before or after we redid the CPA.

From what I understand in the KB you posted, I should add the Windows account of the replicated instance as a member of "CN=Administrators,CN=Roles" on the primary connection server?

I looked on the 2 primary connection servers we have and didn't see the Windows accounts of our replicas as a member.

Reply
0 Kudos
BenFB
Virtuoso
Virtuoso

Check the list of installed programs on the servers. Do you have both the VMwareVDMDS and VMwareVDMDSG components installed on all of the connection servers? If not that could explain your issue since not all of the connection servers are aware of the global session status.

Mach6
Contributor
Contributor

Yes both VmwareVDMDS and VmwareVDMDSG are installed on all connection servers.

Also, just to clarify, we had this issue even before we added the replicas. We had this issue when we were running only 2 primary connections servers, one on each pod.

Here's the result of the "vdmadmin -X -lpinfo" command:

pastedImage_0.png

Reply
0 Kudos
BenFB
Virtuoso
Virtuoso

I think at this point it would be best to work with VMware support. Just keep pushing for an escalation if they can't figure it out.

Reply
0 Kudos
Mach6
Contributor
Contributor

Thank you for your help BenFB Smiley Happy. We still haven't figured out where this issue is emanating from and it's a bit frustrating, considering we opened multiple tickets in the last 2 years with VMware as well as working with our TAM in the beginning of this project. I just wish we had a lab to test and try different things with CPA. If I find anything else on this, I'll make sure to post it here.

Reply
0 Kudos
BenFB
Virtuoso
Virtuoso

What's the connection like between the two pods (Speed, distance and latency)?

Is there a firewall between the pods with logging that you can review to validate if any ports are being blocked? We found a few ports that are needed for CPA that are not documented.

Reply
0 Kudos
Mach6
Contributor
Contributor

The two pods are connected with 10GB FC in the same datacenter.We're going to physically move one of the pods eventually to another datacenter. No firewall between the pods at the moment.

Reply
0 Kudos
BenFB
Virtuoso
Virtuoso

At this point I think VMware may be your best resource. Do you have a VMware TAM or EUC SE? I'd be talking to them to make sure your SR are being prioritized and what the expected behavior vs. your experience is (e.g. Should CPA evenly distribute sessions across pods or is what you are seeing expected?).

Reply
0 Kudos
rossi0209
Contributor
Contributor

Hi

we've got the same issue in our environment.

We have the following configuration:

2 Sites

One global Entitlement

Site1 = 2 Pods,

Site2 = 2 Pods

With one Global Entitlement. The balance between the sites is 25% - 75%.

Did you have a solution meanwhile from VMWare Support?

Regards

Marcus

Reply
0 Kudos
DanofreIqor
Enthusiast
Enthusiast

We have the same issue but inside of site.

4 PODs between 2 sites. F5 LB in front of the sites equally load balancing between sites and PODs.

For example:

Site A:

POD1 - 150 sessions

POD2 -  100 sessions

Site B:

POD3 - 125 sessions

POD4 - 125 sessions

My fear is as we onboard 2000 seats this gets really skewed and my clusters run hot.

Reply
0 Kudos
sjesse
Leadership
Leadership

Open a ticket if you haven't. Going from memory the f5 monitors use least connection, maybe there where 150 on both, then 50 people disconnected. The other thing is the monitors could be timing out barely, just enough have the f5 direct traffic to just one of the pods.

Reply
0 Kudos
Mach6
Contributor
Contributor

Just to give an update concerning this issue. We haven't figured out with VMware support what the issue was.

Since then we stopped using Appvolumes, we updated UEM to DEM 9.9, but still running Horizon 7.3.2. We tested with a load balancer and we were still seeing the same behavior.

We destroyed the CPA and merged our clusters into one environment and we've been running the environment this way for the last year and a half.

We're looking into updating our infrastructure this year with vSAN-ready servers and also using CPA. If these issues are still happening with the latest Horizon versions, I don't know what we're going to do.

Which version of Horizon are you using in your environment?

Reply
0 Kudos
Tompous
Contributor
Contributor

Hi,

Same problem for me with the latest version of Horizon (8 - 2111).. 2 Pods, 2 Connection Servers per PODS, one global catalogue and users spread 50%/50%.
I  just finished the first setup of this kind of infrastructure (CPA) and we are currently migrating our custommer users from old vmware horizon infrastructure without CPA to the new one with CPA.

Today one of the first migrated users got a double session in the POD2 in place of reconnect to a disconnected session on POD 1 and received an error because his FSLogix profile was obviously already locked by another VM.... (no differential disk - Direct access).

I'm a little bit afraid to see that it can be a "common problem".. did anyone find a solution for this behaviour ? 

Regards,

Reply
0 Kudos
Tompous
Contributor
Contributor

I confirm the problem. I got the same behaviour with my test user.

Someone did encounter the same issue and have a solution? 

 

Thanks,

Reply
0 Kudos
Tompous
Contributor
Contributor

Problem persist..

It's like the Horizon global data layer forget that some session are already existing in disconnected state and create a new session into the other DC.. 

I'm surprise that nobody encounters the same behavior....

Reply
0 Kudos