I'd like to get some input of other VMC on AWS users on reliability.
We currently have a small 3 node cluster that we run Horizon 7.9 on with 120 concurrent users. We've been running since February and until last month had not had any real issues. Then we had:
- Host failure on a Monday which all VMware will tell us is they can't give us RCA but it was a hardware issue
- 2 days later we had another host failure which VMware said was the same hardware issue but still can't give us RCA - We actually lost one of our Connection broker servers with this outage, glad it wasn't both.
- 1 week later we had another host failure which while VMware will not give RCA since it was a hardware issue they tell us it was a different hardware issue then the last 2 and since I saw it in action, I saw vSAN alerts so I assume it was a SSD issue, but maybe not.
- Last week VMC update vcenter to 6.9 and we lost connectivity. vCenter reported that we no longer had permissions so of course our service account inside of Horizon was not able to create or delete any desktop VMs which caused us to run out of desktops for our users when they started showing up. It took about 2 hours to resolve and we were told this was a known issue. I asked how they didn't take better precautions with a known issue like that but didn't get a straight answer.
So while we were happy with this engagement we are clearly having 2nd thoughts now. I doubt VMC on Azure would be any different but I'd like to hear any feedback there too.
Additionally we are finding some restrictions that are adding up
- We had an issue and the VMware KB said to shut down Horizon connection servers, reboot vCenter and turn the connection servers back on. Guess what, we can't reboot vCenter. Guess what else, tech support can't reboot vCenter either. So we can't follow VMware's own KB.
- We can set custom permissions. On Prem I have a permission level where my helpdesk can use the Console to view desktops and assist users. The only way to give that permission in VMC on AWS is to make that user a cloud Admin and that's not going to fly.
- You can't set anti-affinity rules. So with Horizon we have 2 connection brokers and I can't force them to different hosts. I'm sure it's just coincidence but it seems like DRS likes to keep those servers together. I'm constantly migrating one of them off.
I would love to hear feedback on what others are seeing. Are we the only ones having issues?