Hello,
A month ago I clicked "Refresh with vCenter certificate" in vCenter -> Administration -> Certificate Management for the STS_CERT/Actions drop down. This was probably a bad idea since then the automatic backup configured in vcsa:5480 has not run, and no indication in the :5480 gui that it has failed either.
The STS Refresh was made because I was reconfiguring the certificate management with our on CA, but was probably unecessary to refresh it.
Anyhow, digging through the logs in /var/log/vmware/applmgmt I found these lines which gets logged when the backup is executed:
vmware.appliance.extensions.authentication.authentication_sso.AuthenticationError: One or more certificates cannot be verified.
2022-04-25T00:00:02 AM CEST [2211]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token
Traceback (most recent call last):
File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 243, in authenticate
username = token.username
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 486, in username
return self.get_name_id().value
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 939, in get_name_id
'//saml2:Subject/saml2:NameID', self.reference)
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 477, in reference
self.validate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 1169, in validate
reference = super(HolderOfKeyToken, self).validate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 505, in validate
signing_chain = self.validate_certificate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 685, in validate_certificate
'One or more certificates cannot be verified.')
vmware.appliance.extensions.authentication.authentication_sso.AuthenticationError: One or more certificates cannot be verified.
I've searched around in the KB's and all I can find are solutions for dealing with expired STS certs, which is not the case for me.
The KB https://kb.vmware.com/s/article/76448 seems to describe the symptoms accuratly, but the suggested Resolution does not seem to apply since I cannot see any stale or invalid certificate chains in Administration -> Certificates -> Certificate Management
Do you guys have any suggestions what steps to take from here to get out of this situation ? Manual backups works fine, and I've also tried "turning it off and on again".
With kind regards,
Martin
As suspected you have 2 TrustedCertChain for sts .
Did you use the latest fixsts script from https://kb.vmware.com/s/article/76719. As this issue was fixed as I recall with script.
Download jxplorer from https://kb.vmware.com/s/article/2146046 . Connect and check as below . In my lab I have one but you should have 2 .
Take snapshot and delete TrustedCertChain-2 and reboot vCSA and check please .
Hi Martin ,
I feel you are hitting the below issues , Please check this KB an follow the steps .
https://kb.vmware.com/s/article/76448
Please press Kudos and mark it as solution provided if this works .
Thanks ,
Pramod Ashnal
Hello @pashnal
Thank you for your feedback on this. As I wrote in my original post, I have read that KB, and while the symptoms are the same Im not able to verifiy this given the recommended Resolution. The screenshot in the KB does not look like my attached screenshot, so I'm unable to verifiy stale or invalid certificate chains.
Is that screenshot in the KB from an the old Flash based webclient perhaps?
Best regards,
Martin
can you run lsdoctor -l using below KB and share output.
https://kb.vmware.com/s/article/80469
Hello @Ajay1988
Thank you for your taking responding here.
Below is the output you requested from lsdoctor
root@vcenter [ /tmp/lsdoctor-master ]# python lsdoctor.py -l
ATTENTION: You are running a reporting function. This doesn't make any changes to your environment.
You can find the report and logs here: /var/log/vmware/lsdoctor
2022-04-29T12:56:15 INFO main: You are reporting on problems found across the SSO domain in the lookup service. This doesn't make changes.
2022-04-29T12:56:16 INFO live_checkCerts: Checking services for trust mismatches...
2022-04-29T12:56:16 INFO generateReport: Listing lookup service problems found in SSO domain
2022-04-29T12:56:16 INFO generateReport: No issues detected in the lookup service entries for vcenter.REDACTED (VC 7.0 or CGW).
2022-04-29T12:56:16 INFO generateReport: Report generated: /var/log/vmware/lsdoctor/vcenter.REDACTED-2022-04-29-125615.json
Attached to this post is also the complete lsdoctor report.
With kind regards,
Martin
Hello @Ajay1988
Thank you for your taking your time responding here.
Below is the output you requested from lsdoctor
root@vcenter [ /tmp/lsdoctor-master ]# python lsdoctor.py -l
ATTENTION: You are running a reporting function. This doesn't make any changes to your environment.
You can find the report and logs here: /var/log/vmware/lsdoctor
2022-04-29T12:56:15 INFO main: You are reporting on problems found across the SSO domain in the lookup service. This doesn't make changes.
2022-04-29T12:56:16 INFO live_checkCerts: Checking services for trust mismatches...
2022-04-29T12:56:16 INFO generateReport: Listing lookup service problems found in SSO domain
2022-04-29T12:56:16 INFO generateReport: No issues detected in the lookup service entries for vcenter.REDACTED (VC 7.0 or CGW).
2022-04-29T12:56:16 INFO generateReport: Report generated: /var/log/vmware/lsdoctor/vcenter.REDACTED-2022-04-29-125615.json
Attached to this post is also the complete lsdoctor report.
With kind regards,
Martin
Ok..That's good. So something not right with VAMI certs it seems . Please follow the below.
Baackup the below cert file .
/etc/applmgmt/appliance/server.pem
restart vami service
/sbin/service vami-lighttp restart
Hello @Ajay1988
I did as you instructed. A bit unsure if that would have any effect since I've restarted the vCenter server a few times lately which would have the same effect (?). After restarting the VAMI with /sbin/service vami-lighttp restart I changed my backup schedule in :5480 to a few minutes ahead and tail'ed the logs:
backupSchedulerCron.log:
2022-04-29 13:46:01,701 49532 Issuing the scheduled backup request for schedule: default.
backupScheduler.log:
2022-04-29T13:46:02.379 [0] [MainProcess:PID-49532] [Scheduler::ExecScheduleRun:Scheduler.py:138] ERROR: Failed to issue the Schedules.run request. Exception: {challenge : None, messages : [LocalizableMessage(id='vapi.security.authentication.invalid', default_message='Unable to authenticate user', args=[], params=None, localized=None)], data : None, error_type : UNAUTHENTICATED}
Traceback (most recent call last):
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/Scheduler.py", line 133, in ExecScheduleRun
status = svc_handle.run(scheduleId, comment='SCHEDULED')
File "/usr/lib/applmgmt/pyclient/applmgmt_client-1.0-py2.7.egg/com/vmware/appliance/recovery/backup_client.py", line 1171, in run
'comment': comment,
File "/usr/lib/applmgmt/vapi/lib/vapi_runtime-2.100.0-py2.py3-none-any.whl/vmware/vapi/bindings/stub.py", line 345, in _invoke
return self._api_interface.native_invoke(ctx, _method_name, kwargs)
File "/usr/lib/applmgmt/vapi/lib/vapi_runtime-2.100.0-py2.py3-none-any.whl/vmware/vapi/bindings/stub.py", line 298, in native_invoke
self._rest_converter_mode)
com.vmware.vapi.std.errors_client.Unauthenticated: {challenge : None, messages : [LocalizableMessage(id='vapi.security.authentication.invalid', default_message='Unable to authenticate user', args=[], params=None, localized=None)], data : None, error_type : UNAUTHENTICATED}
and applmgmt.log:
2022-04-29T13:46:02 PM CEST [2211]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token
Traceback (most recent call last):
File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 243, in authenticate
username = token.username
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 486, in username
return self.get_name_id().value
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 939, in get_name_id
'//saml2:Subject/saml2:NameID', self.reference)
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 477, in reference
self.validate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 1169, in validate
reference = super(HolderOfKeyToken, self).validate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 505, in validate
signing_chain = self.validate_certificate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 685, in validate_certificate
'One or more certificates cannot be verified.')
vmware.appliance.extensions.authentication.authentication_sso.AuthenticationError: One or more certificates cannot be verified.
2022-04-29T13:46:09 PM CEST [2211]INFO:vmware.appliance.vapi.auth:Authorization request for service_id: com.vmware.appliance.recovery.backup.job.details, operation_id: list
2022-04-29T13:46:09 PM CEST [2211]DEBUG:vmware.vherd.base.authorization_local:Verify privileges user (root) privilege ['ModifyConfiguration']
2022-04-29T13:46:09 PM CEST [2211]DEBUG:root:Validated user privileges in localstore or SSO
2022-04-29T13:46:09 PM CEST [2211]DEBUG:vmware.appliance.update.update_state:In State._get using state file /etc/applmgmt/appliance/software_update_state.conf
So unfortionately It didn't seem to make any difference in this case.
Best regards,
Martin
Then for sure u have more than one trusted STS chain. Do u have an SR with support ? If not get one ..upload VC logs and LDIF for vmdird
Sorry to say that I have a SUBSCRIPTION ONLY entitlement, so not able to get support 😞
Best regards,
Martin
Ahh.
Can u run the below and share the file. Let me see what best I can do here.
/opt/likewise/bin/ldapsearch -b "dc=vsphere,dc=local" -s sub -D "cn=Administrator,cn=Users,dc=vsphere,dc=local" -W > vCSA-name.ldif
Hello @Ajay1988
Ok. Attached is the output.
Thank you very much for your time & effort into this.
Kind regards,
Martin
As suspected you have 2 TrustedCertChain for sts .
Did you use the latest fixsts script from https://kb.vmware.com/s/article/76719. As this issue was fixed as I recall with script.
Download jxplorer from https://kb.vmware.com/s/article/2146046 . Connect and check as below . In my lab I have one but you should have 2 .
Take snapshot and delete TrustedCertChain-2 and reboot vCSA and check please .
Thank you!
I have yet to try this, but I noticed that was also a duplicate under
-> TentantCredential-1
-> TentantCredential-2
Should those be kept or to remove TentantCredential-2 as well ?
Regards,
Martin
That even could be.
/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text > /var/core/trusted-certs.txt
Send this file over . But still try that out without deleting tenantcredential-2 . Will check this tomorrow.
Hello,
Attached is trusted-certs.txt
I tried as suggested earlier to delete TrustedCertChain-2 and kept TentantCredential-1 and TentantCredential-2 intact.
After reboot of VCSA I got "no healthy upstreams" for about 10 minutes, until I reverted my snapshot and rebooted into original state.
The attached trusted-certs.txt is with duplicate TrustedCertChain and TentantCredential.
Regards,
Martin
Hello again @Ajay1988
I wanted to provide you with an update this morning. I've deleted both
TentantCredential-2
TrustedCertChain-2
And restarted vCenter. And now the scheduled backup runs fine and I don't see any of the previous errors in
applmgmt.log, backupSchedulerCron.log, backupScheduler.log
I hope this did the trick, I'll keep my fingers crossed for the next couple of days and hope it works fine.
Thank you for your useful tips and your time.
With kind regards,
Martin