VMware Cloud Community
rgcda
Enthusiast
Enthusiast

can't add ESXi host to cluster

I recently upgrade my vCenter (windows) from 6 to 6.5 U1.  In the vCenter there are several clusters of ESXi 6.0 U2 hosts.  Post upgrade of vCenter I noticed a couple of ESXi servers in one of the clusters were having issues with HA.  All the ESXi servers are the same configuration.  They are Cisco UCS B200's that are on the hardware compatibility list.  So I went through some steps in a VMware KB.  I tried to Reconfigure the ESXi host for vSphere HA, I disabled and re-enabled HA on the cluster, and rebooted the ESXi host.  None of these got rid of the HA problem.  There weren't any specific HA errors other than it wouldn't enable either.  So as the last step I removed the ESXi host from vCenter and attempted to add the ESXi host back into the cluster.  First I tried adding the host directly back into the cluster.  The task to add the host would stay at 0% and would never complete.  Browsing and tasks within the specific cluster would be unresponsive in the web client and vsphere client.  I could navigate in another cluster in the vCenter, but responsiveness in the cluster I was trying to add the ESXi host into was very sluggish to not working at all.  I had to reboot vCenter to be able to manage the environment again.  I tried that several times and they I rebuilt the host with a clean installation of ESXi 6.0U2 and had the same problem and then rebuilt as ESXi 6.5U1 and had the same problem.  I rebuilt the ESXi host again and added it to the root of vCenter and it added in just fine.  I then attempted to drag the ESXi host into the cluster where it was prior to the vCenter upgrade and vCenter would hang up again and the task to add it to the cluster would be at 0%.  So I restarted vCenter again and added the host into the root of vCenter and dragged it into one of the other clusters and it added into the cluster just fine.  So there is something wrong with the one cluster and I can't figure it out.  Anybody have any ideas.  The ESXi host was in the cluster without issue prior to the vCenter upgrade and was it it post upgrade although it had a HA error which started this whole mess.  Additionally, there are 20 other ESXi servers in this cluster that are exactly the same hardware / firmware that are working fine.  I haven't attempted to remove any hosts to test though.

0 Kudos
5 Replies
rgcda
Enthusiast
Enthusiast

I opened a case with BCS support a week ago and they haven't been able to address the issue either.

0 Kudos
SureshKumarMuth
Commander
Commander

Can you try this ?

Try adding the host to a cluster where HA is enabled and to cluster where HA is not enabled? Since the issue started with HA error. I suspect the issue could be only on HA enabled cluster. Just a try though I have not faced this issue. This step would help to isolate the issue.

Regards,
Suresh
https://vconnectit.wordpress.com/
0 Kudos
rgcda
Enthusiast
Enthusiast

I have added the ESXi host to a different cluster that had HA enabled already successfully and there were no issues.  I'll try disabling HA in the cluster I'm having issues with and then adding the ESXi host in question to see if that makes a difference.  Going to wait for some backups to complete in that cluster first.

0 Kudos
SureshKumarMuth
Commander
Commander

I have seen cluster specific issue (not exactly same as yours) on one of my customer's envt where the issue will be seen only on one particular cluster while other cluster with similar settings work fine. Finally we could not trace the cause which ended up in deleting this cluster and creating a new cluster. This issue is something related to vCenter / DB entries. I think eventually VMware will also suggest the same.

This issue neither occur on all environments nor reproducible.

Regards,
Suresh
https://vconnectit.wordpress.com/
0 Kudos
rgcda
Enthusiast
Enthusiast

Unfortunately that is one of the options I've already considered.  That really is not a very good solution as it's a lot of work and poses some risks.  I disabled HA on the cluster and tried to put the host in it and it's the same problem.  VMware needs to put a little effort into resolving the issue.  What the heck is BCS support for?

Thank you for your replies.

0 Kudos