VMware Cloud Community
TimR26
Enthusiast
Enthusiast

Auto Deploy 6.7 U1b - certificate issues at waiter.tgz

I'm developing auto-deploy and i've gotten to the point where a the ESXi installation begins, but stalls when trying to install waiter.tgz. I checked the /var/log/vmware/rbd/rbd-cgi.log file and found this:

beacon:Adding etc/vmware/autodeploy/waiternotify.json

cache_tables:cache request ....

sslutil:cert files are missing from <UID> (autodeployhost.contoso.com)

sslcert:Generating SSL cert for <UID> (autodeployhost.contoso.com)

ERROR:vmcacertutil:Could not generate certificates for: autodeployhost.contoso.com

rc:0

out: b'Error: 5, VMCASignedCertificatePrivate() failedError Code: 5\nMessage: UNKNOWN\n'

Not sure what other info to provide to help determine the issue....any suggestions/guidance?

Tags (2)
Reply
0 Kudos
7 Replies
msripada
Virtuoso
Virtuoso

Error 5 is access denied.. I am unsure if we are hitting any access denied here but i dont see  a reason for access denied in autodeploy...

Have you tried restarting the rbd service and then tried deploying the ESXi host?

Can you check the vpxd.log and the rbd.log

Thanks,

MS

Reply
0 Kudos
matthewingram
Contributor
Contributor

did you find a resolution for this issue?

Reply
0 Kudos
dbuenoparedes
Enthusiast
Enthusiast

I have the same problem and see the same log entries in the /var/log/vmware/rbd/rbd-cgi.log file. I was trying to re-deploy the host using auto deploy after upgrading VCSA from 6.0U3 to 6.7U2c, I even tried removing the host from the vCenter server's inventory but still the same problem, it gets stuck downloading waiter.tgz.

There seems to be a problem with the new certificate that tries to issue when host gets re-deployed. I've checked using the certool with the following command and I see there is still a certificate for that host that wasn't deleted when I removed it from the inventory:

/usr/lib/vmware-vmca/bin/certool --enumcert --filter=all | less

Edit: I've tried re-deploying the host after restarting the Auto Deploy waiter service, I also rebooted the VCSA once after removing the ESXi host from the inventory but it still gets stuck at the same step of the deployment.

Reply
0 Kudos
Vijay2027
Expert
Expert

This requires assigning necessary permissions to waiter user which has to be done by connecting to vmdird DB with LDAP browser (Jxplorer)

As the steps involved requires modifying vmdird DB, file a SR with GSS to get this sorted.

Reply
0 Kudos
dbuenoparedes
Enthusiast
Enthusiast

Thanks for the reply Vijay2027​, you nailed it, I ended up opening a ticket with VMware support. They checked these log files:

  • /var/log/vmware/rbd/rbd-cgi.log (VCSA)
  • /var/log/vmware/vmcad/vmcad-syslog.log (PSC)

We have an external PSC deployment in our environment, the key was in the following lines of the vmcad-syslog.log file:

2019-10-18T18:27:47.942203+00:00 warning vmcad  t@140271253645056: error code: 0x00000005

2019-10-18T18:27:47.942370+00:00 warning vmcad  t@140271253645056: error code: 0x00000005

2019-10-18T18:27:47.942537+00:00 warning vmcad  t@140271253645056: error code: 0x00000005

2019-10-18T18:28:08.373709+00:00 info vmcad  t@140271253645056: VMCACheckAccessKrb: Authenticated user waiter-d0cef9c5-5f40-4671-83f7-f611d19354cb@vsphere.local

2019-10-18T18:28:08.380445+00:00 info vmcad  t@140271253645056: Checking upn: cn=CAAdmins,cn=Builtin,dc=vsphere,dc=local against CA admin group: waiter-d0cef9c5-5f40-4671-83f7-f611d19354cb@vsphere.local

2019-10-18T18:28:08.380970+00:00 warning vmcad  t@140271253645056: error code: 0x00000005

2019-10-18T18:28:08.381299+00:00 warning vmcad  t@140271253645056: error code: 0x00000005

2019-10-18T18:28:08.381563+00:00 warning vmcad  t@140271253645056: error code: 0x00000005

2019-10-18T18:28:09.205803+00:00 info vmcad  t@140271253645056: VMCACheckAccessKrb: Authenticated user waiter-d0cef9c5-5f40-4671-83f7-f611d19354cb@vsphere.local

2019-10-18T18:28:09.210938+00:00 info vmcad  t@140271253645056: Checking upn: cn=CAAdmins,cn=Builtin,dc=vsphere,dc=local against CA admin group: waiter-d0cef9c5-5f40-4671-83f7-f611d19354cb@vsphere.local

What support ended up doing is connecting via LDAP (with JXplorer) to the PSC and creating that waiter-d0cef9c5-5f40-4671-83f7-f611d19354cb@vsphere.local user that was missing from the CAAdmins group.After this user was created I was able to re-deploy the ESXi host without any issue. There were 2 other waiter users with a different string of chars after them but for some reason Auto Deploy was looking for this one specifically but was missing from that group of users.

I hope this helps.

Reply
0 Kudos
Vijay2027
Expert
Expert

Right. Sometimes we end up re-created the ID using dir-cli if the user doesn't exists.

Reply
0 Kudos
Ivanuci
Contributor
Contributor

Thanks to all.

We had the same problem with autodeploy stopping with "Fatal error: 15". Finding this page I checked rbd-cgi.log and vmcad-syslog.log ...

root@vc1 [ ~ ]# cat /var/log/vmware/rbd/rbd-cgi.log | grep -E "rror|ERROR"
2021-03-03T09:21:23.536 [6150]ERROR:vmcacertutil:Could not generate certificates for: 10.2.2.1
out: b'Error: 5, VMCAGetSignedCertificatePrivate() failedError Code : 5\nMessage :UNKNOWN\n'
2021-03-03T09:21:23.553 [6150]ERROR:pluginmaster:exception:rbdplugins.sslcert.vmwWaiterTgz -- 0:b'Error: 5, VMCAGetSignedCertificatePrivate() failedError Code : 5\nMessage :UNKNOWN\n':b"Operation Failed: exception <class 'vmca.vmca_exception'> not a BaseException subclass"
Exception: 0:b'Error: 5, VMCAGetSignedCertificatePrivate() failedError Code : 5\nMessage :UNKNOWN\n':b"Operation Failed: exception <class 'vmca.vmca_exception'> not a BaseException subclass"
2021-03-03T09:21:23.554 [6150]WARNING:waitertgz:retrying waiter tgz because of rc: [None, None, None], except: [Exception('0:b\'Error: 5, VMCAGetSignedCertificatePrivate() failedError Code : 5\\nMessage :UNKNOWN\\n\':b"Operation Failed: exception <class \'vmca.vmca_exception\'> not a BaseException subclass"',)]


root@vc1 [ ~ ]# tail /var/log/vmware/vmcad/vmcad-syslog.log
2021-03-03T09:34:52.765706+01:00 info vmcad t@140664742344338: VMCACheckAccessKrb: Authenticated user waiter-a67cf497-3462-48bb-868d-866c983aa484@vsphere.local
2021-03-03T09:34:52.770946+01:00 info vmcad t@140664742344338: Checking upn: cn=CAAdmins,cn=Builtin,dc=vsphere,dc=local against CA admin group: waiter-a67cf497-3462-48bb-868d-866c983aa484@vsphere.local
2021-03-03T09:34:52.771150+01:00 warning vmcad t@140664742344338: error code: 0x00000005
2021-03-03T09:34:52.771329+01:00 warning vmcad t@140664742344338: error code: 0x00000005
2021-03-03T09:34:52.771497+01:00 warning vmcad t@140664742344338: error code: 0x00000005


Using dir-cli in vCenter shell I checked users in CAAdmins group and found out that two waiter accounts are there but the one from vmcad-syslog.log (waiter-a67cf497-3462-48bb-868d-866c983aa484) is missing.


root@vc1 [ ~ ]# /usr/lib/vmware-vmafd/bin/dir-cli group list --name CAAdmins
Enter password for administrator@vsphere.local:
cn=Administrator,cn=Users,dc=vsphere,dc=local
cn=DCAdmins,cn=Builtin,dc=vsphere,dc=local
cn=DCClients,cn=Builtin,dc=vsphere,dc=local
CN=waiter 0af35be1-fc4b-427a-8181-1a25dbaa1270,cn=users,dc=vsphere,dc=local
CN=waiter 5a882302-063f-4bb1-9eac-6cbd662d5130,cn=users,dc=vsphere,dc=local

 

I looked for this particular user in other user groups (Users, Administrators ...) hoping I will find it somewhere but I did not. So I tried to create it and found out that it actually exists:

root@vc1 [ ~ ]# /usr/lib/vmware-vmafd/bin/dir-cli user create --account waiter-a67cf497-3462-48bb-868d-866c983aa484 --first-name waiter --last-name a67cf497-3462-48bb-868d-866c983aa484 --user-password 'testpass'
Enter password for administrator@vsphere.local:
dir-cli failed. Error 9706: Possible errors:
LDAP error: Already exists
Win Error: Operation failed with error ERROR_TOO_MANY_NAMES (68)

 

Great, because I had no idea what password to give to the new user. Now I just had to add existing user to CAAdmins group:

root@vc1 [ ~ ]# /usr/lib/vmware-vmafd/bin/dir-cli group modify --name CAAdmins --add waiter-a67cf497-3462-48bb-868d-866c983aa484

 

Adding user to CAAdmins group was successful and Autodeploy started working immediately.

Reply
0 Kudos