VMware Horizon Community
vXav
Expert
Expert
Jump to solution

Horizon View 7.4 issues after vCenter connectivity wobbles

Note1: We use Instant clone. Horizon View 7.4 & vCenter 6.5

Note2: We had something similar a few weeks ago. During a firewall upgrade, both vCenters of our 2 VDI environment had a network downtime and put the VDI infra in a weird state. We had to reboot all the connection servers and vCenter servers to bring it back to normal.

Now, after a network wobble over the weekend, our monitoring picked up a downtime of around 5 minutes on the vCenter of our VDI environment, in a normal environment you would see host disconnect/reconnect, nothing too bad. I expected Horizon View to be resilient enough to survive such a small issue.

I remembered what happened the last time so I went to check and there it is, same issue as last time ...

The issue is rather weird, there is connectivity between the CS and vCenter but it seems "some" things don't work.

  • Users will be able to log in as long as there are available desktops.
  • When they log off, the VM is deleting from vCenter but not in Horizon View.
  • The desktop goes in error state with the following status.

25-Jun-2018 09:09:10 CEST: Failed to delete VM - Caught exception deleting VM XXXXXXX. Timed out waiting for operation to complete. Total time waited 5 mins

Pairing state:In pairing...

Configured by:<connection server #1> <connection server #2> ...

Attempted theft by:

2018-06-25_10h40_58.jpg

  • After that in the events there is a "Provisioning error occurred for Machine XXXXXX: Unable to remove Machine from inventory" and they are filled with

Automatic error recovery for Pool YYYYYY: attempting recovery for Machine XXXXXXX

2018-06-25_10h43_02.jpg

Which of course does not succeed. I could clean it with viewdbcheck but it won't actually solve anything so it's no use.

  • No new desktop can be created. Meaning it is a slow death as users sign out until there is no more desktop available (until I reboot everything).

The creation of a new pool also fails.

2018-06-25_10h48_42.jpg

As you can imagine having a network wobble break the VDI infra and having to restart everything is not sustainable for production especially as more and more users use it.

I'm new to Horizon View so any idea will be greatly appreciated.

1 Solution

Accepted Solutions
BenFB
Virtuoso
Virtuoso
Jump to solution

We were provided a hotfix for 7.4.0 that has helped with this.

VMware-viewconnectionserver-x86_64-7.4.0-8215536.exe

View solution in original post

7 Replies
sjesse
Leadership
Leadership
Jump to solution

"downtime of around 5 minute"

That is along time. Horizon in the back end uses an ldap database that tries to mimic the horizon infrastructure. When you delete a vm its deleted from the ldap database, and also vcenter. When that connection is interrupted the connection servers are no longer synchronized, and the way to fix it is to restart the connection servers. I don't know any applicaiton that can survive a 5 minute network outage. You should read this

VMware Knowledge Base

Reply
0 Kudos
paulmike3
Enthusiast
Enthusiast
Jump to solution

We're experiencing similar issues since upgrading to Horizon 7.4 from 7.2.0 about a month ago.

We are seeing that if a vCenter goes down (for a maintenance reboot, Windows patching, etc.), View logs that it's unavailable for the duration that it's down, and reconnects when it's back up, but has severe issues until a Connection Server service restart takes place on one or all of the Connection Servers in the pod.

We have power on policies on all of our VM pools (persistent, ful clone VMs), where if a VM is shut down by a user, View tells vSphere to power it back up almost immediately.  This operation stops working once the vSphere unavailability above take place.  We also have provisioning problems (failed customization, "Error", "Missing", etc.).

This has happened at least 3 times over the last 2 weeks, and it's beginning to impact production (VMs powered off so users can't access them). We have had to constantly monitor the inventory and manually power them on, sometimes dozens at a time (we have a lot of VMs).

It's been resolved for us each time by recycling the Connection Server services overnight when fewer users are connecting (or by taking them out of the load balancers one by one and recycling the services), but that is getting old and not sustainable.

We're opening a BCS case shortly, but I wanted to add that we're seeing the same lack of resiliency since we upgraded to Horizon 7.4, that we didn't see on previous 7.x releases.

Horizon 7.4

VCenter Server 6.0.0 Build 7462484

BenFB
Virtuoso
Virtuoso
Jump to solution

We were provided a hotfix for 7.4.0 that has helped with this.

VMware-viewconnectionserver-x86_64-7.4.0-8215536.exe

vXav
Expert
Expert
Jump to solution

Thanks both for your input that is very interesting stuff indeed.

It seems wrong that a mere network interruption breaks the whole thing.

paulmike3​ Please let us know how your SR evolves.

I opened one myself, we'll see how it goes. I'll keep you posted.

vXav
Expert
Expert
Jump to solution

Quick follow up on this.

A opened a case and they gave me VMware-viewconnectionserver-x86_64-7.4.0-8741716.exe

I initiated a VCHA failover to simulate an outage and the VDI still worked when vCenter came back.

Hard to be sure 100% that this was but for now it'll have to do.

Reply
0 Kudos
paulmike3
Enthusiast
Enthusiast
Jump to solution

Thanks for the follow-up.  We were given the new installer as well, but we haven't pulled the trigger and applied it yet.

Edit: The exe we received was VMware-viewconnectionserver-x86_64-7.4.0-8215536.exe. Looks like there's a newer build (8741716) others have received.

We asked for additional info on it (what else changes, can it be applied manually, how many customers have applied it, any known issues, etc.).  With multiple View pods and close to 2 dozen Connection Servers, this is not a simple activity for us so we need to understand the risk before moving forward.

Is anyone here aware of any issues with the unreleased build?

Reply
0 Kudos
vXav
Expert
Expert
Jump to solution

As far as I can tell it is a very minor version (that's bad english). The support guys refer to it as a "hotpatch", though I could not tell a lot about what exactly is in it...

The potential benefit of it made it easily worth the risk for me as I can't tolerate a network blip to take down the whole thing and force me to ask everyone to log off to shut down all the connection servers. It is a major and concerning flaw in my opinion.

Check with the support but maybe you could apply the patch on the connection servers of one pod to start with and see how it behaves? The install does not require a reboot but it will kill the services for a good 2 minutes so mind that if you have servers in tunneling mode.

Let us know how it goes.

Ps: The second time I had the issue (pre-hotfix), I only cycled the connection servers and left the vCenter alone. Worked ok.

Reply
0 Kudos