VMware Cloud Community
aj800
Enthusiast
Enthusiast
Jump to solution

ESXi host will not reconnect to cluster after upgrade to version 7.0.3 (7.0U3d)

I was able to successfully upgrade all hosts in a cluster we have to 7.0.3 U3d (using HPE custom image).  We have a separate cluster that has only one ESXi host in it that is also managed by this vCenter Server.  I've been managing patches/updates on it using VUM like I have all the other hosts in the other cluster.  Since upgrading to vSphere 7, this was the last host to upgrade and patch to the latest version.  I used the same Image and Baseline created for it, then attached the baseline to the host, then removed a VIB that was preventing the upgrade (Incompatible: this was seen on all other hosts as well, and removed without issue (ssacli version 3.47.6.0-6.7.0.7535516), then rebooted.  Once back up, it displayed as Non-compliant, like the others did, which allowed me to run the upgrade.  After the host came back up at 7.0 U3d (19482537), it did NOT connect back to the vSphere cluster.  I tried to reconnect, but got an error:

A general system error occurred:  Host management agents not reachable on (our host name)

The "Remediate host" task was stuck at 47% for nearly an hour before I cancelled it, and I tried to restart the management services from the DCUI on the host and I rebooted, which did not fix the issue.  I do not have the option to remove it from inventory, only to Connect, which is what gave me this error each time.  What happened?  All our certs are the same and valid also, it that helps.  We've had a similar issue previously from certs or CAs that didn't match.  The certs match those of the others that completed upgrades successfully.  Any assistance would be great.  Thanks.

UPDATE: I was able to remove it from the inventory of the cluster, but when I try to add it back, I get the same general system error.  When I try to add the host, I get a warning that says, "The certificate on 1 host could not be verified", and I have to check a box to manually verify it and it continues to the next step to "Finish", but I get the error and it never gets added to the cluster.

UPDATE 2: This turned out to be a certificate mismatch issue, once again.  We changed the ESXi certificate per internal CA policy and we must have been missing a CA in the trust store for the old one on the ESXi host.  Not 100% sure that's it, but once we updated to the new cert on the host, I was able to reconnect it to the cluster.

0 Kudos
1 Solution

Accepted Solutions
aj800
Enthusiast
Enthusiast
Jump to solution

This turned out to be a certificate mismatch issue, once again. We changed the ESXi certificate per internal CA policy and we must have been missing a CA in the trust store for the old one on the ESXi host. Not 100% sure that's it, but once we updated to the new cert on the host, I was able to reconnect it to the cluster.

View solution in original post

0 Kudos
1 Reply
aj800
Enthusiast
Enthusiast
Jump to solution

This turned out to be a certificate mismatch issue, once again. We changed the ESXi certificate per internal CA policy and we must have been missing a CA in the trust store for the old one on the ESXi host. Not 100% sure that's it, but once we updated to the new cert on the host, I was able to reconnect it to the cluster.

0 Kudos