VMware Cloud Community
cendioMartin
Contributor
Contributor
Jump to solution

We refreshed our STS_CERT and now scheduled backup of VCSA won't run

Hello,

A month ago I clicked "Refresh with vCenter certificate" in vCenter -> Administration -> Certificate Management for the STS_CERT/Actions drop down. This was probably a bad idea since then the automatic backup configured in vcsa:5480 has not run, and no indication in the :5480 gui that it has failed either.

The STS Refresh was made because I was reconfiguring the certificate management with our on CA, but was probably unecessary to refresh it.

Anyhow, digging through the logs in /var/log/vmware/applmgmt I found these lines which gets logged when the backup is executed:

vmware.appliance.extensions.authentication.authentication_sso.AuthenticationError: One or more certificates cannot be verified.
2022-04-25T00:00:02 AM CEST [2211]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token
Traceback (most recent call last):
  File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 243, in authenticate
    username = token.username
  File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 486, in username
    return self.get_name_id().value
  File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 939, in get_name_id
    '//saml2:Subject/saml2:NameID', self.reference)
  File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 477, in reference
    self.validate()
  File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 1169, in validate
    reference = super(HolderOfKeyToken, self).validate()
  File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 505, in validate
    signing_chain = self.validate_certificate()
  File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 685, in validate_certificate
    'One or more certificates cannot be verified.')
vmware.appliance.extensions.authentication.authentication_sso.AuthenticationError: One or more certificates cannot be verified.

 

I've searched around in the KB's and all I can find are solutions for dealing with expired STS certs, which is not the case for me.

The KB https://kb.vmware.com/s/article/76448 seems to describe the symptoms accuratly, but the suggested Resolution does not seem to apply since I cannot see any stale or invalid certificate chains in Administration -> Certificates -> Certificate Management 

 

Screenshot from 2022-04-25 11-01-23.png

Do you guys have any suggestions what steps to take from here to get out of this situation ? Manual backups works fine, and I've also tried "turning it off and on again".

With kind regards,

Martin

 

0 Kudos
1 Solution

Accepted Solutions
Ajay1988
Expert
Expert
Jump to solution

As suspected you have 2 TrustedCertChain for sts .

Did you use the latest fixsts script from https://kb.vmware.com/s/article/76719. As this issue was fixed as I recall with script. 

Download jxplorer from https://kb.vmware.com/s/article/2146046  . Connect and check as below . In my lab I have one but you should have 2 . 

Take snapshot and delete TrustedCertChain-2 and reboot vCSA and check please . 

Ajay1988_0-1651239852484.png

 

If you think your queries have been answered
Mark this response as "Correct" or "Helpful".

Regards,
AJ

View solution in original post

16 Replies
pashnal
Enthusiast
Enthusiast
Jump to solution

Hi Martin , 

I feel you are hitting the below issues , Please check this KB an follow the steps . 

https://kb.vmware.com/s/article/76448

Please press Kudos and mark it as solution provided if this works . 

Thanks , 

Pramod Ashnal 

 

0 Kudos
cendioMartin
Contributor
Contributor
Jump to solution

Hello @pashnal 

Thank you for your feedback on this. As I wrote in my original post, I have read that KB, and while the symptoms are the same Im not able to verifiy  this given the recommended Resolution. The screenshot in the KB does not look like my attached screenshot, so I'm unable to verifiy stale or invalid certificate chains. 

Is that screenshot in the KB from an the old Flash based webclient perhaps?

Best regards,

Martin

0 Kudos
Ajay1988
Expert
Expert
Jump to solution

can you run lsdoctor -l using below KB and share output.
https://kb.vmware.com/s/article/80469

If you think your queries have been answered
Mark this response as "Correct" or "Helpful".

Regards,
AJ
0 Kudos
cendioMartin
Contributor
Contributor
Jump to solution

Hello @Ajay1988 

Thank you for your taking responding here.

Below is the output you requested from lsdoctor

root@vcenter [ /tmp/lsdoctor-master ]# python lsdoctor.py -l

ATTENTION: You are running a reporting function. This doesn't make any changes to your environment.
You can find the report and logs here: /var/log/vmware/lsdoctor

2022-04-29T12:56:15 INFO main: You are reporting on problems found across the SSO domain in the lookup service. This doesn't make changes.
2022-04-29T12:56:16 INFO live_checkCerts: Checking services for trust mismatches...
2022-04-29T12:56:16 INFO generateReport: Listing lookup service problems found in SSO domain
2022-04-29T12:56:16 INFO generateReport: No issues detected in the lookup service entries for vcenter.REDACTED (VC 7.0 or CGW).
2022-04-29T12:56:16 INFO generateReport: Report generated: /var/log/vmware/lsdoctor/vcenter.REDACTED-2022-04-29-125615.json


Attached to this post is also the complete lsdoctor report.

With kind regards,
Martin 

0 Kudos
cendioMartin
Contributor
Contributor
Jump to solution

Hello @Ajay1988 

Thank you for your taking your time responding here.

Below is the output you requested from lsdoctor

root@vcenter [ /tmp/lsdoctor-master ]# python lsdoctor.py -l

ATTENTION: You are running a reporting function. This doesn't make any changes to your environment.
You can find the report and logs here: /var/log/vmware/lsdoctor

2022-04-29T12:56:15 INFO main: You are reporting on problems found across the SSO domain in the lookup service. This doesn't make changes.
2022-04-29T12:56:16 INFO live_checkCerts: Checking services for trust mismatches...
2022-04-29T12:56:16 INFO generateReport: Listing lookup service problems found in SSO domain
2022-04-29T12:56:16 INFO generateReport: No issues detected in the lookup service entries for vcenter.REDACTED (VC 7.0 or CGW).
2022-04-29T12:56:16 INFO generateReport: Report generated: /var/log/vmware/lsdoctor/vcenter.REDACTED-2022-04-29-125615.json


Attached to this post is also the complete lsdoctor report.

With kind regards,
Martin 

0 Kudos
Ajay1988
Expert
Expert
Jump to solution

Ok..That's good. So something not right with VAMI certs it seems . Please follow the below.

Baackup the below cert file .

/etc/applmgmt/appliance/server.pem

restart vami service

/sbin/service vami-lighttp restart

If you think your queries have been answered
Mark this response as "Correct" or "Helpful".

Regards,
AJ
0 Kudos
cendioMartin
Contributor
Contributor
Jump to solution

Hello @Ajay1988 

I did as you instructed. A bit unsure if that would have any effect since I've restarted the vCenter server a few times lately which would have the same effect (?). After restarting the VAMI with /sbin/service vami-lighttp restart I changed my backup schedule in :5480 to a few minutes ahead and tail'ed the logs:

backupSchedulerCron.log:

2022-04-29 13:46:01,701 49532 Issuing the scheduled backup request for schedule: default.

backupScheduler.log:

 2022-04-29T13:46:02.379 [0] [MainProcess:PID-49532] [Scheduler::ExecScheduleRun:Scheduler.py:138] ERROR: Failed to issue the Schedules.run request. Exception: {challenge : None, messages : [LocalizableMessage(id='vapi.security.authentication.invalid', default_message='Unable to authenticate user', args=[], params=None, localized=None)], data : None, error_type : UNAUTHENTICATED}
Traceback (most recent call last):
File "/usr/lib/applmgmt/backup_restore/py/vmware/appliance/backup_restore/Scheduler.py", line 133, in ExecScheduleRun
status = svc_handle.run(scheduleId, comment='SCHEDULED')
File "/usr/lib/applmgmt/pyclient/applmgmt_client-1.0-py2.7.egg/com/vmware/appliance/recovery/backup_client.py", line 1171, in run
'comment': comment,
File "/usr/lib/applmgmt/vapi/lib/vapi_runtime-2.100.0-py2.py3-none-any.whl/vmware/vapi/bindings/stub.py", line 345, in _invoke
return self._api_interface.native_invoke(ctx, _method_name, kwargs)
File "/usr/lib/applmgmt/vapi/lib/vapi_runtime-2.100.0-py2.py3-none-any.whl/vmware/vapi/bindings/stub.py", line 298, in native_invoke
self._rest_converter_mode)
com.vmware.vapi.std.errors_client.Unauthenticated: {challenge : None, messages : [LocalizableMessage(id='vapi.security.authentication.invalid', default_message='Unable to authenticate user', args=[], params=None, localized=None)], data : None, error_type : UNAUTHENTICATED}

 and applmgmt.log:

2022-04-29T13:46:02 PM CEST [2211]ERROR:vmware.appliance.vapi.auth:Could not parse HOK Token
Traceback (most recent call last):
File "/usr/lib/applmgmt/vapi/py/vmware/appliance/vapi/auth.py", line 243, in authenticate
username = token.username
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 486, in username
return self.get_name_id().value
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 939, in get_name_id
'//saml2:Subject/saml2:NameID', self.reference)
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 477, in reference
self.validate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 1169, in validate
reference = super(HolderOfKeyToken, self).validate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 505, in validate
signing_chain = self.validate_certificate()
File "/usr/lib/applmgmt/lib/extensions/py/vmware/appliance/extensions/authentication/authentication_sso.py", line 685, in validate_certificate
'One or more certificates cannot be verified.')
vmware.appliance.extensions.authentication.authentication_sso.AuthenticationError: One or more certificates cannot be verified.
2022-04-29T13:46:09 PM CEST [2211]INFO:vmware.appliance.vapi.auth:Authorization request for service_id: com.vmware.appliance.recovery.backup.job.details, operation_id: list
2022-04-29T13:46:09 PM CEST [2211]DEBUG:vmware.vherd.base.authorization_local:Verify privileges user (root) privilege ['ModifyConfiguration']
2022-04-29T13:46:09 PM CEST [2211]DEBUG:root:Validated user privileges in localstore or SSO
2022-04-29T13:46:09 PM CEST [2211]DEBUG:vmware.appliance.update.update_state:In State._get using state file /etc/applmgmt/appliance/software_update_state.conf

 

So unfortionately It didn't seem to make any difference in this case.

Best regards,

Martin

0 Kudos
Ajay1988
Expert
Expert
Jump to solution

Then for sure u have more than one trusted STS chain. Do u have an SR with support ? If not get one ..upload VC logs and LDIF for vmdird

If you think your queries have been answered
Mark this response as "Correct" or "Helpful".

Regards,
AJ
0 Kudos
cendioMartin
Contributor
Contributor
Jump to solution

Sorry to say that I have a SUBSCRIPTION ONLY entitlement, so not able to get support 😞

Best regards,
Martin

0 Kudos
Ajay1988
Expert
Expert
Jump to solution

Ahh. 

Can u run the below and share the file. Let me see what best I can do here.

/opt/likewise/bin/ldapsearch -b "dc=vsphere,dc=local" -s sub -D "cn=Administrator,cn=Users,dc=vsphere,dc=local" -W > vCSA-name.ldif

If you think your queries have been answered
Mark this response as "Correct" or "Helpful".

Regards,
AJ
0 Kudos
cendioMartin
Contributor
Contributor
Jump to solution

Hello @Ajay1988 

Ok. Attached is the output.

Thank you very much for your time & effort into this.

Kind regards,

 

 

Martin

0 Kudos
Ajay1988
Expert
Expert
Jump to solution

As suspected you have 2 TrustedCertChain for sts .

Did you use the latest fixsts script from https://kb.vmware.com/s/article/76719. As this issue was fixed as I recall with script. 

Download jxplorer from https://kb.vmware.com/s/article/2146046  . Connect and check as below . In my lab I have one but you should have 2 . 

Take snapshot and delete TrustedCertChain-2 and reboot vCSA and check please . 

Ajay1988_0-1651239852484.png

 

If you think your queries have been answered
Mark this response as "Correct" or "Helpful".

Regards,
AJ
cendioMartin
Contributor
Contributor
Jump to solution

Thank you!

I have yet to try this, but I noticed that was also a duplicate under
 -> TentantCredential-1
 -> TentantCredential-2

Should those be kept or to remove TentantCredential-2 as well ?
Regards,
Martin

0 Kudos
Ajay1988
Expert
Expert
Jump to solution

That even could be. 
/usr/lib/vmware-vmafd/bin/vecs-cli entry list --store TRUSTED_ROOTS --text > /var/core/trusted-certs.txt

Send this file over . But still try that out without deleting tenantcredential-2 . Will check this tomorrow. 

If you think your queries have been answered
Mark this response as "Correct" or "Helpful".

Regards,
AJ
0 Kudos
cendioMartin
Contributor
Contributor
Jump to solution

Hello,

Attached is trusted-certs.txt
I tried as suggested earlier to delete TrustedCertChain-2 and kept TentantCredential-1 and TentantCredential-2 intact.
After reboot of VCSA I got "no healthy upstreams" for about 10 minutes, until I reverted my snapshot and rebooted into original state.

The attached trusted-certs.txt is with duplicate TrustedCertChain and TentantCredential.


Regards,
Martin

0 Kudos
cendioMartin
Contributor
Contributor
Jump to solution

Hello again @Ajay1988 

I wanted to provide you with an update this morning. I've deleted both 

TentantCredential-2
TrustedCertChain-2

And restarted vCenter. And now the scheduled backup runs fine and I don't see any of the previous errors in 
applmgmt.log, backupSchedulerCron.log, backupScheduler.log

I hope this did the trick, I'll keep my fingers crossed for the next couple of days and hope it works fine.

Thank you for your useful tips and your time.

With kind regards,
Martin