Just upgraded to vSphere 7 Update 1 and see that in the VMs and Templates view I see it created the folder for vCLS. I only have one cluster and DRS is complaining about the unhealthy state of the vSphere Cluster Service...which makes sense as none have been created. I created a new cluster out of curiosity and moved some hosts into that cluster but getting the same issue.
In Virtual Center, when I go to Admin -> vCenter Server Extensions and look at the vSphere ESX Agent Manager I see both clusters have alerts and both have the same message..."Cluster agent VM is missing in the cluster" which makes sense, none exist. Nice that there is a Resolve All Issues button but that doesn't resolve any issues of mine.
I am poking around trying to find logs that help pin point the exact issue but haven't been successful just yet. Has anyone seen this before or can point me in the right direction of the logs to find the underlying issue why the vCLS VMs are not getting created.
All ESXi hosts have been patched to 7.0.1 and have the same CPU make / model (Intel)
** Update: Ok found in the EAM log "can't provision VM for ClusterAgent due to lack of suitable datastore". All of my stores have 100 or more GBs free....but will start down that path **
Exact same issue, I had .. just fixed by running fixsts.sh : ))
My problem was a little "bigger".
Before stsfix, we created a test cluster and moved ESXi - vcenter didn't create a vCLS VM here either.
After stsfix in the existing cluster, vcenter tried to create vCLS and immediately deleted it (every minute!), in the cluster test vcenter created it without any problems. So stsfix didn't help us.
However, we tried the procedure described in https://kb.vmware.com/s/article/80472 and after disabling/enabling vCLS creations, the existing cluster recovered and created vCLS correctly and DRS is fully functional.
Thanks to All for your help!
Thanks for the info / results. For me was able to get them to deploy using the lsdoctor tool.
Ran "phython lsdoctor.py -l"....told me there was an SSL Trust Mismatch
Ran "phython lsdoctor.py -t"....corrected the Mismatch
vCenter immediately created and powered up the vCLS machines and DRS appears to be happy again.
Anyone hear any updates yet? Is there a 17004997_7.0.1.00100_vcsa floating about, or an updated lsdoctor?
So I swore at this issue for a day or so too, after migrating from VCSA 6.7 to 7.01 (build 17327586) yesterday. In my case, my 6.7 VCSA (whatever the most recent version of 6.7 was on January 24, 2021) had been migrated multiple times over the years from version to version to version as required from I want to say VCSA 5 (but maybe it was 5.5). It's also had an AD Certificate Authority issued certificate on it for many years (the cert says it's valid from May 2015 to July 2024, so it's been around for a while). Eventually (after I opened a ticket with VMware Support 3+ hrs ago to which I haven't gotten a response to yet) I stumble on to this thread, which lead me to this route of resolution.
These are the steps I took, in the order I took them.
By the time I made a pitstop for coffee, got the Chrome cache cleared, and managed to get logged back into VC, all the vCLS were finally deployed.
Incidently, the EAM.log indicated this prior to the fixsts.sh:
FAILED: com.vmware.eam.sso.exception.TokenNotAcquired: Couldn't acquire token due to: Signature validation failed
Caused by: com.vmware.vapi.std.errors.ServiceUnavailable: ServiceUnavailable (com.vmware.vapi.std.errors.service_unavailable)
Can't provision VM for ClusterAgent(ID: 'Agent:48c988c8-570a-43d6-a12a-XXXXXXXXXX:null') due to lack of suitable datastore.
dcolpitts - that process worked for me. I had previously gone through the lsdoctor script but didn't resolve the issue. I t was the fixsts that was needed as I had 3 root certs within the system. This vcenter has also been upgraded from previous versions. Thanks for sharing the solution
Ive just spend hours trying to fix this issue, running all the scripts / commands from VMware.
Your guide worked for me! Thanks so much for the help! 🙂
dcolpitts, I opened a support ticket with VMware and the technician and I ended up using steps 2-7 of your solution. Thanks. He also sends his Kudos to you.
I raised a ticket for the same problem, pointed out this post to the support engineer, and still ended up waiting hours for them to drip feed me the steps themselves.
Thank you dcolpitts
Slight update on my original instructions. Getting the scripts onto the vCenter is a pain, so I now just use curl to pull them down The overall steps are still the same...
SSH the vCenter appliance with Putty and login as root and then cut and paste these commands down to the first "--stop--". Then apply each command / fix as required for your environment. Note that the curl links were valid at the time I created this post (2021.05.17).
--start cut & paste below here--
curl https://kb.vmware.com/sfc/servlet.shepherd/version/download/0685G00000NxYfZQAV -o /root/configure_retreat_mode.py
curl https://kb.vmware.com/sfc/servlet.shepherd/version/download/0685G00000S5Q77QAF -o /root/lsdoctor.zip
curl https://kb.vmware.com/sfc/servlet.shepherd/version/download/068f400000HW9InAAL -o /root/checksts.py
curl https://kb.vmware.com/sfc/servlet.shepherd/version/download/068f400000JAn50AAD -o /root/fixsts.sh
chmod +x /root/fixsts.sh
python /root/lsdoctor-master/lsdoctor.py -l
python /root/lsdoctor-master/lsdoctor.py --stalefix
python /root/lsdoctor-master/lsdoctor.py --trustfix
service-control --stop --all
service-control --start --all
Step 3-7 resolved this for me. I had 3 root certs.
Thank you for the guide.
Running a fresh install of 7.0.2U2 and encountered this issue, but none of the solutions mentioned worked for me. I tailed the eam.log file and noticed errors stating that it was not able to connect to the database and 'service still initializing' type messages.
I found this KB article: https://kb.vmware.com/s/article/2112577
The steps in it worked for me.
Facing similar issue when vCLSs couldn't be created. Error: Can't provision VM for ClusterAgent(ID: 'Agent::null') due to lack of suitable datastore.
None of the steps from this thread helped.
After some troubleshooting with VMware, they pointed out the SRM is the issue as per https://docs.vmware.com/en/Site-Recovery-Manager/8.4/com.vmware.srm.admin.doc/GUID-531FB787-8B30-401...
Unfortunately SRM also broke after the 7.0.2 upgrade so work is still in progress to fix the SRM first and then to unprotect one datastore for vCLS
fixsts.sh fixed my problem too ... thank you !