VMware Cloud Community
bobdhenderson
Enthusiast
Enthusiast

Deploy the Management Domain Using ESXi Hosts with External Certificates: The Return of the Thing

Hi all,

I've recently stumbled through the custom JSON requirements because I wanted to test the impact of external CA certs on the Management Workload Domain ESXi hosts during bring-up.  Those steps were of interest because I'd tried to retrospectively change the vCenter's certmgmt.Mode from default vmca to custom after issuing it a replacement cert and establishing trust with the external CA that issued it and then updated the individual ESXI hosts in its cluster (like I would a non-VCF cluster).

There were subsequent errors in the cluster's VSAN health reports about SSL certificates for the hosts that had been updated - although everything seemed to function.  I cleared those errors by placing each host in turn in maintenance mode, decommissioning it, reimaging it, then uploading its external cert, then recommissioning into the management workload domain cluster.  All seemed good and no webclient errors.

However, when following the VCF 4.3 Operations guide to shut down the environment, at the point on the first ESXi host when the python.py -prepare script is run it then timed out... with the following errors.

After 60 attempts like this it errors out...
WARNING: root: Retry retrieving vsan vmodl version, 59
ERROR: root: Failed to test vsan vmodl version with error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1108) on localhost
Exiting cluster preparation.
Encounter error when schedule VSAN reconfig task. Please check vSAN log for more detail and retry

Similar errors when python.py -recover was attempted at start up.

I figured that the recommissioning approach hadn't fixed something in the guts of the cluster config so wiped the Proof of concept and started again with the JSON set to use 'custom' mode for certMgmt.mode on a fresh deployment.

That JSON has worked, no SSL errors in the deployed cluster in the Mgmt WLD.  HOWEVER..... when I've shut down the VCF again the reboot_helper.py script does exactly the same error routine.

I can live with rebuilding the PoC environment - it's all useful experience - but I'm not clear on how important that reboot_helper.py script is and if it's indicating an underlying problem with the SSL interconnectivity for VSAN on the hosts.  I've only seen the error on the webclient because I manually tried to retrofit the external certs - seems like it does it from the outset.  (And surely I'm not the only one considering using external certs on my ESXI hosts?)

Can anyone put me straight please?

0 Kudos
0 Replies