VMware Cloud Community
JLogan2016
Enthusiast
Enthusiast
Jump to solution

HA initialization error on a single host

We recently had a sudden hard power outage in our data center which brought down several hosts in one of my vSphere 5.5 clusters (5.5.0 2302651). Once the issue was resolved I brought all affected hosts back up and have been working at getting them back into service in the cluster. All 8 hosts came back up, but one is giving me an error when going through the HA election. The error is : vSphere HA cannot be installed or configured. I have attempted to reconfigure HA, restart the management services, restarted the box, remove and re-add to the cluster, etc. with no luck. I even went so far as to take the host out of service completely and reload ESXi, but get to the same point and have the same issue. It states the HA agent is installed fine, but during the election I get the same error.

In searching the error message, I see references to disabling HA at the cluster level, waiting an hour, and then enabling it. My concern is bringing it down and then having this issue on additional hosts, basically breaking HA at a higher level. Just curious if anyone has any suggestions on steps I may have overlooked before going that route.

Reply
0 Kudos
1 Solution

Accepted Solutions
JLogan2016
Enthusiast
Enthusiast
Jump to solution

In the end, although I was trying to avoid it, I ended up disabling and re-enabling HA at the cluster level. We started seeing other hosts displaying the same behavior as the first. The issue is resolved now. Thanks for the suggestions.

View solution in original post

Reply
0 Kudos
5 Replies
Mattallford
Hot Shot
Hot Shot
Jump to solution

Hi there,

After you try re configuring / enabling HA on the host, check out the /var/log/fdm.log and /var/log/hostd.log files, which will hopefully provide some more information.

There are a few reasons why the HA agent might be playing up, and hopefully there will be some data in the above logs to point us in the right direction.

Cheers, Matt.

VCP6-DCV | VCAP6-DCV Deploy @mattallford If you found my answers useful, please help me by marking them as Helpful or Correct!
Reply
0 Kudos
rajeshrr
Contributor
Contributor
Jump to solution

  • Place the ESXi host in Maintenance Mode.
  • Connect to the ESXi host with SSH session.
  • Run this command to find the list of VIBs installed on the ESXi host:
    esxcli software vib list
  • In the /var/run/log/fdm-installer.log file, find the relevant VIB causing the issue.
  • Run this command to remove the VIB after verifying the dependency:
    esxcli software vib remove -n vibname
  • Run this command to remove FDM agent from the ESXi host:
    esxcli software vib remove -n vmware-fdm
  • Exit maintenance mode.
  • Reconfigure HA on a cluster level.
  • Restart management services
  • Disconnect and reconnect the host from vCenter Server.
  • Let me know if the issue still persists.
Reply
0 Kudos
JLogan2016
Enthusiast
Enthusiast
Jump to solution

Thanks Mattallford‌. I have attached both log files. Here are some of the highlights I am seeing in each, nothing is jumping right out at me as the root cause:

Hostd:

  16:15.40 - Warning 'Locale' No message string to format vim.option.def

  16:15.47 - Ticket invalidated

  16:18.06 - Request cancelled

FDM:

  16:15.40 - Not running election

  16:15.40.138 - Cluster membership changed

  16.15.40.796 - Connected to hostd and listening on TCP socket

Reply
0 Kudos
JLogan2016
Enthusiast
Enthusiast
Jump to solution

Thanks, rajeshrr‌. I tried this, but it tells me there is no such file or directory. Doing a search of the host I find only an fdm.log file, not fdm-installer.log

Reply
0 Kudos
JLogan2016
Enthusiast
Enthusiast
Jump to solution

In the end, although I was trying to avoid it, I ended up disabling and re-enabling HA at the cluster level. We started seeing other hosts displaying the same behavior as the first. The issue is resolved now. Thanks for the suggestions.

Reply
0 Kudos