I'm currently troubleshooting an issue where one of our vCenters has lost connectivity to our View Connection servers (Horizon 7). Our environment consists of 2 6.5 vCenter servers, one running Windows 2012R2 and the other running the vCenter Server Appliance. These 2 vCenters support our VDI environment running Horizon 7. Unfortunately, the vCenter running Windows is the vCenter that houses the VDI virtual machines while the vCenter running the VCSA, houses the servers (database, connection, vcenters, etc.).
Upon logging into Horizon, we noticed that there were zero sessions running. Within the Summary tab of our desktop pool, we noticed two errors stating:
Error during Provisioning: Unable to automatically connect to VC. It is possibly down.
Datastores used for the desktop pool are low on disc space.
After navigating to the Inventory tab of the Desktop pool, I noticed majority of VM's are in maintenance mode (also wondering why it's able to receive this data since it cannot connect to the vCenter holding those VM's). I restarted the vCenter server and shortly after started seeing sessions beginning to populate. Continued to test by logging into our own zero client, but was greeted with the error:
The assignd desktop source for this desktop is not currently available.
So as of now, we currently have a handful of sessions active but majority of people still cannot gain access to the pool. We then utilized the vSphere Web Client on our VCSA vCenter and noticed that the other vCenter (Windows) is not shown (usually, both clusters are shown) and an error is displayed across the top:
Cannot connect to one or more vCenter Server Systems: https://vCenterFQDN:443/sdk
I attempted the following KB articles with no success
At this point, I'm trying to get the Windows vCenter connected with the rest of the environment. Any help would be greatly appreciated.
What is the used credential for connecting to the vCenter? from vSphere SSO domain or Active Directory?
Also, Horizon Connection Servers must always have a permanent network connection on TCP port 443 to the vCenter Server. Is there any firewall or something else between the windows-based vCenter server and the connection server?
At last please tell me what is your desktop pool type?! (Full Clone, Linked Clone, Instant Clone, RDS) Is there any RDSH or Composer Server in your VDI environment?
After running the service-control --status --all command, the following services are running/stopped:
Running: VMwareAfdService, VMwareCertificateService, VMwareDirectoryService, VMwareComponentManager, VMwareDNSService, VMwareIdentityMgmtService, VMwareSTS, VServiceManager, rhttpproxy, vimPBSM, vmon, vmonapi, vmsyslogcollector, vmware-cis-config, vmware-license, vmware-psc-client, vmwareServiceControlAgent, vpxd-svcs, vsan-health
Stopped: Exsagentmanager, VMwarecomservice, content-library, mbcs, vapiendpoint, VMware.autodeploy-waiter, VMware-imagebuilder, VMware-network-coredump VMware-perfcharts, vpxd, vsphere-ui, vspherewebclientsvc
Attempted to run the following the command to get the Web Client back up with no success:
service-control --start vspherewebclientsvc
UPDATE: On Friday, I attempted to revert the Windows vCenter to a previous snapshot (snapshot taken before we ran into errors accessing the web client on that vCenter server) which was back in March. Now, unfotunately, we have lost access to our VCSA vCenter's Web Client resulting in no accessible vSphere Web Client to access our servers. Is there any other way to access vCenter outside of the Web Client? Luckily, I had locked my account before taking off on Friday and I at least still have Vmware Remote Consoles into both the Windows vCenter and the VCSA if I need to do anything within the servers. I don't see how reverting to a previous snapshot on one vCenter would have an effect on accessing the Web Client on the other vCenter.
After reviewing the system logs on the Windows vCenter, several errors were found to include: SCCM kicking and failing to install updates, trust relationship with primary domain controller failed, and time sync failed with domain controllers. Tested the trust relationship by attempting to log in with a service account (in Active Directory) and received the following error:
Trust relationship between this workstation and the primary domain failed.
I'm hesitant to take it "off" the domain and back on again because I'd lose connection to the Remote Console but I know I'll have to at some point when adding it back to the domain.
Again, I don't understand why all this would translate into why we can't access the Web Client from our other vCenter server (VCSA).
The credentials being is the SSO domain account. There is no firewall or other appliance in between the servers. We're using Linked Clones and both our vCenters are also Composer servers. I posted an update on my current situation in my previous response.
Did you check is there sufficient permissions for your credential in Active Directory, that has been set for the composer server? Check following link for required permissions:
And also check the deployed VM guest OS (If its deployment process has been done but you still cannot connect to it) is there VMware Horizon Agent & View Composer Agent started and connected? To check list of related agents check following link:
Many of primary/main services of VCSA like VMwareDirectoryService have been started unless vpxd and VMwarecomservice and still there are errors on (vsphere-ui, vspherewebclientsvc) so you need to investigate more details of log files when you try to start the services.
And Trust Relationship problem has been occurred because of the un-synchronization of time between DCs and clients, but regardless of Time settings if you deployed your systems by SCCM check the template machine SID. (It can be related to the SID Duplication) You should Sysprep them before deploying new machine from this reference system.
We utilize a service account to log into Horizon that does have AD permissions.
As far as the two services you mentioned, the only Horizon service I found on the Windows vCenter was VMware Horizon 7 Composer. Then, I checked the services using the service-control utility and did not find the services you mentioned there either. Are they named differently?
I started leaning towards the path that my local service accounts (local user accounts) on my Windows vCenter has their passwords either expired or corrupted in some way. I then set a new password for all of the service/user accounts that have their associated service stopped. Next, I noticed the "User must change password at next logon" was checked, so I unchecked it and checked the "Password never expires" box so we can avoid password expiration issues in the future (probably against best practices). After this, I restarted all the vCenter services (with service-control) and was able to get the content-library and vapi-endpoint service up and running. However, the following services are still stopped:
The vpxd service is able to start but after a couple minutes, it stops again no matter how many times i restart it. I combed through the event logs in regards to the vpxd service and found the path for the vcenter logs. There was an error I saw that reported the config.ini and settings.ini files could not be found.
I'm thinking of bringing down my entire VDI environment ("gracefully") and bringing it up again to see if that will get all the components working together again.
No need to bring down your VDI environment. I think it's still related to your vCenter server components, check them again before re-construct the Horizon connection servers. (Although you can test it with a new fast deployed connection server, it seems because of vCenter Server.)
Unfortunately, I can't pull the vpxd logs due to the server located on a classified network (I might have to just write it down) but I did find a couple SQL database errors that are causing the vpxd service to stop. The log specifies the SQL database is over the error threshold (currently at 97% capacity). I'm pretty sure this is the cause of most of my problems.
I tried using the following VMware Knowledge Base to purge old data but turns out most of the steps are already in place such as configuring the event.maxAge values. I ran the stored procedure mentioned in the article which ran successfully but the database is still at 97%.
Currently, I'm trying to find other articles to clean up the database since I know we have logs that are way too old. For example, our vpxd log thread number is at 147 (don't know if this is normal).
Log number is okay, in vSphere 6.5 and above, a new mechanism have introduced wherein if the db size is more than 95%, vcenter service would stop automatically until it is below 95%.
You can check the events and tasks table
check vpx_text_array table VMware Knowledge Base
I was able to purge the transactional log in my vCenter database by switching to a Simple Recovery Model and then switching back to Full. Now, our database is way below the threshold to be of any concern. Also, a strucuted backup plan is being created to avoid our transactional logs was filling up again in the future.
I was able to start up the vpxd service and it is stable again.
So far I:
Now, I'm unable to access the web client for both the Windows vCenter and the Server Appliance vCenter. I'm not too familiar with Photon OS so I'll look up some articles on how to start the services within the server appliance.
Aside from that, with the database and service accounts taken care of, I don't know what else could affect the vCenters not communicating to either each other or to the Horizon connection servers (they can all ping each other).