Hi,
I have two HP Proliant DL360 Gen10 hosts running ESXi 6.5 in a cluster (VCSA 6.7). Following a recent firmware update, the SD cards where ESXi resides, start behaving weird. Sometimes the SD cards won't be seen at boot time, sometimes they will but, after booting, /bootbank would be linked to /tmp, and sometimes will work fine. I noticed that, if I change the boot type from UEFI to Legacy BIOS, the SD cards behave normally so after the last tests I migrate the VMs from one host to another to avoid downtime, then reboot the hosts and change the boot type.
After the reboot, one of the host is fine, the other reports HA issues, basically the host is there, can communicate with vCenter but it times out when trying to reconfigure it for HA.
I check vpxa.log and fdm.log especially and peek through the rest of the logs but I can't find anything obvious. I also reconfigure boot type to UEFI for the host but it didn't help.
Could anyone point me in the right direction to troubleshoot this?
Thanks,
Daniel
The issue i had at a customers site was this: VMware Knowledge Base (KB74966)
Is it possible for you to update the hosts to the 6.7 or the latest 6.5(HP Image)?
There are other issues with drivers here: ESXi 6.5 U2 Hosts become unresponsive - VMKernel.log errors
Regards
Felix
If you have only 2 hosts inside the cluster, it's obviously a normal reaction to issue an HA warning. Because when you reboot each of hosts, there are not enough members for remaining host inside the cluster for failover actions or migrating the VMs
Maybe I wasn't clear enough... The HA issue exists with both hosts up & running.
Hey,
i think there was an issue with some version of 6.5 resulting in temporary non responsiveness of the host.
Which 6.5 build you have?
Best Regards
Felix
Just a thought. Unless already done, disable HA, and then enable it again to see whether this helps.
André
Felix,
The hosts are running a HPE customized image 6.5.0 update 2 based on ESXi 6.5.0 Update 2 Vmkernel Release Build 9298722.
Thanks,
Daniel
The issue i had at a customers site was this: VMware Knowledge Base (KB74966)
Is it possible for you to update the hosts to the 6.7 or the latest 6.5(HP Image)?
There are other issues with drivers here: ESXi 6.5 U2 Hosts become unresponsive - VMKernel.log errors
Regards
Felix
Andre,
I just tried that and it didn't change anything. One host was ok after re-enable HA, the other one timed out.
Daniel
Felix,
I'm trying to patch the host using VUM but I got into some network issue with the outgoing traffic so I have to fix that first.
Thanks,
Daniel
Please check the /var/log/fdm.log file to check the VMware HA agent operations, maybe you find something related to your issue
Felix,
I pushed the latest updates for 6.5 and the HA got back. Everything looks normal now.
Thank you everyone for the assistance.
Daniel