I'm not sure if this problem is related to VCF or if it is related to VRSLCM - but lets see how it goes
VCF 4.5.0 is complaining about accounts/ passwords
"19 accounts have been disconnected. Visit Password Management page to take action"
However the passwords appear to be OK - I can logon to the various components e..g I can logon to VRSLCM using root and vcfadmin@local (which is being complained about)
Any ideas what's causing this warning?
SSH to SDDC Manager and run lookup_password VRSLCM
Try remediate password in SDDC UI with VRLCM's password (lookup_password result)
Did you any one change the password directly by skipping the SDDC .
please check the lookup_passwords and confirm if all the password are matching . if they are not then that explains the issue why they got disconnected.
Hi @GaryJBlake, Hi @AughBT, it happened to me twice, the first time only with NSXT root account (first 4screenshots), then the second time like you @AughBT with alot of accounts,
but it was on lab, it'd be interesting to know if you were on lab as well or if you can confirm CPU I/O contention like i showed on last picture.
Here is how i handled it:
1) ESXI service accounts
Steps to recover expired Service Accounts in VMware Cloud Foundation (83615)
SSH into each of the 4 Nested ESXi
[root@vcf-m01-esx01:~] passwd svc-vcf-vcf-m01-esx01
Changing password for svc-vcf-vcf-m01-esx01
Enter new password:
Re-type new password:
passwd: password updated successfully
(note i didn't do the reset failed login part)
SDDC Manager ESXI svc accounts -> 3dots REMEDIATE with this newly created password
2) NSXT MANAGER root admin audit account
REMEDIATE using same password used in the deployment script
3) PSC - KB: Password rotation for email@example.com causes issues when multiple VMware Cloud Foundation instances share a single SSO domain (85485)
we must be logged with an another SSO user with ADMIN role
to be able to click REMEDIATE on PSC firstname.lastname@example.org
I think a proper SSO ADMIN user like email@example.com illustrated in the KB is the way to go on production.
In my case since it was a lab i found an SSO account, so i promoted it to admin role.
Disclamer: i do not know if that is the supported even thought:
from the remediate password window we learn that service acount will be rotate after the remediate,
we can remove admin role from this service account.
Using UI it's easly done instead of API
a) SDDC manager UI as firstname.lastname@example.org -> Single Sign On -> +USERS AND GROUPS -> Search User: svc , Refine search by: Single User, Domain: vsphere.local
select the user svc-vcf-m01-nsx01-vcf-m01-vc01 -> Choose Role: ADMIN (note this can be done from vCenter)
b) vCenter UI as email@example.com -> Licensing -> Single Sign On -> Users and Groups -> Users -> Domain: vsphere.local, Find: svc -> EDIT: Password, Confirm Password
c) SDDC manager UI as firstname.lastname@example.org -> Security -> Password Management -> PSC -> email@example.com -> REMEDIATE again using the same original password
optionally e) redo a) but select the 3dots and remove the admin role on this service SSO user.
Found that the root cause to be a nested lab environment use case or CPU-I/O contention on the hosts,
occurring on a task towards the end of the bringup called "Configure Base Install Image Repository on SDDC Manager",
that copy vcsa iso and nsx ova to an nfs on the 4 Nested ESXi VSAN datastore,
that made the cpu to the roof and consequently applications ruuning in the three VMs vCenter, NSX and SDDC manager had kernel stuck at one point or multiple time.
Looking deeper into it, i think the subsequent tasks might had issue with kernel stuck vms (i feel there maybe missing pieces to understand it all ...).
Was monitoring while that contention happened,then made screenshots CPU and I/O usage of 2 SDDC bringup at time of that copy task to illustrate:
one when that whole issue occured with 4 nested ESXi
one with 1 nested ESXi using FTT=0 trick given by William Lam https://williamlam.com/2023/02/vmware-cloud-foundation-with-a-single-esxi-host-for-management-domain...
using less vCPUs (8 instead of 4x8) and a faster I/O capable NVMe SSD (PCIe 4.0 instead of 3.0) confirmed without kernel stuck all is well 😁.
I think that on real gears this should not happen.
Looking back at your issue and mine there is a catch:
When we mouse hover ⓘ there is a bubble informing us that sync should be happening no more than 24h.
So mine fall in expected result because i didn't give a chance after the deployment to sync and refresh, less than 24h.
But looking at you picture there have been 6 months between sync,
from 13 June to 22 November,
since you created this issue on 22 November,
if it got self healed for you one day later, please keep us informed thanks.