VMware Cloud Community
CraigTompkins1
Contributor
Contributor

Reliability?

I'd like to get some input of other VMC on AWS users on reliability.

We currently have a small 3 node cluster that we run Horizon 7.9 on with 120 concurrent users.  We've been running since February and until last month had not had any real issues.  Then we had:

  • Host failure on a Monday which all VMware will tell us is they can't give us RCA but it was a hardware issue
  • 2 days later we had another host failure which VMware said was the same hardware issue but still can't give us RCA - We actually lost one of our Connection broker servers with this outage, glad it wasn't both.
  • 1 week later we had another host failure which while VMware will not give RCA since it was a hardware issue they tell us it was a different hardware issue then the last 2 and since I saw it in action, I saw vSAN alerts so I assume it was a SSD issue, but maybe not.
  • Last week VMC update vcenter to 6.9 and we lost connectivity.  vCenter reported that we no longer had permissions so of course our service account inside of Horizon was not able to create or delete any desktop VMs which caused us to run out of desktops for our users when they started showing up.  It took about 2 hours to resolve and we were told this was a known issue.  I asked how they didn't take better precautions with a known issue like that but didn't get a straight answer.

So while we were happy with this engagement we are clearly having 2nd thoughts now.  I doubt VMC on Azure would be any different but I'd like to hear any feedback there too.

Additionally we are finding some restrictions that are adding up

  • We had an issue and the VMware KB said to shut down Horizon connection servers, reboot vCenter and turn the connection servers back on.  Guess what, we can't reboot vCenter.  Guess what else, tech support can't reboot vCenter either.  So we can't follow VMware's own KB.
  • We can set custom permissions.  On Prem I have a permission level where my helpdesk can use the Console to view desktops and assist users. The only way to give that permission in VMC on AWS is to make that user a cloud Admin and that's not going to fly.
  • You can't set anti-affinity rules.  So with Horizon we have 2 connection brokers and I can't force them to different hosts.  I'm sure it's just coincidence but it seems like DRS likes to keep those servers together.  I'm constantly migrating one of them off.

I would love to hear feedback on what others are seeing.  Are we the only ones having issues?

TIA

Craig

Reply
0 Kudos
4 Replies
hsherwin09
Enthusiast
Enthusiast

Hey Craig,

I can't help with a lot of your points, but I can help with one:

VM-VM Anti-Affinity in VMware Cloud on AWS: Can you please confirm that this documentation doesn't work for you? Create or Delete a VM-VM Anti-Affinity Policy

Cheers,

Harrison

Reply
0 Kudos
CraigTompkins1
Contributor
Contributor

I do have that setup already but it does not seem to be honored all the time.  When talking with Support I was told that they don't allow anti-affinity rules and I took that to mean this doesn't really work either.  Maybe the tech was actually talking about DRS rules and not compute policies and I need to follow up.

Reply
0 Kudos
A13xxx
Enthusiast
Enthusiast

Hi Craig, Welcome to this nightmare!!!

I know the feeling, they have a very small engineering team that have higher access than those you speak to when you log a call. The lead time on response and resolution is days/ weeks!

If you are unlucky to have a host failure (like we have had) the world gets turned upside down and you are left in limbo as well as the front line just watching the extremely slow automated script fail VMs over without any guarantee that HA will restart VMs. You have no alerting, nothing.

The worse thing you can experience is if the vCenter, or vmware cloud appliances develop an issue. NO ONE apart from that small engineering team who are likely to be based on a planet far far away are the only ones able to help. You or the support do not have access. The only thing you can do is either wait for the SDDC or appliances to be updated which triggers a restart (which majority of the time fixes things) or engineering get back to you.

There are plenty of new and hidden bugs which only a handful of people know about. This can lead to you having an issue open for weeks, meeting after meeting and the horrible task of being asked to gather information logs etc (while waiting for engineering who have access to help you).

Hopefully things will improve...

Reply
0 Kudos
vakeem
Contributor
Contributor

We can set custom permissions.  On Prem I have a permission level where my helpdesk can use the Console to view desktops and assist users. The only way to give that permission in VMC on AWS is to make that user a cloud Admin and that's not going to fly.

Custom permissions work just fine on VMC.

On VMC's vCenter

Menu -> Administration -> Single Sign On -> Configuration - Identify Sources

Add your on-prem LDAP server. This gives the cloud vCenter access to your users/groups.

Menu -> Administration -> Roles -> +

Create a new role and specify the privileges for the role.

Menu -> Administration -> Access Control -> Global Permissions(This can also be done from the vCenter object)

Select the +

User = "on prem domain"

Role = "new role you just defined"

Now when that user logs on they will have the limitations specified in the role.

You can't set anti-affinity rules.  So with Horizon we have 2 connection brokers and I can't force them to different hosts.  I'm sure it's just coincidence but it seems like DRS likes to keep those servers together.  I'm constantly migrating one of them off.

Yes you can. The short version is:

Tag a VM

Menu -> Policies and Profiles -> Compute Policies -> Add

Full Docs Here: Create or Delete a VM-Host Affinity Policy

Reply
0 Kudos