I'm keep seeing the "not responding" message on VCenter randomly and I can't figure out what's going on. The VCenter seems to loose connection to the Hosts but they hosts are 100% healthy and works fine (VMs are also online and no issues).
It happens randomly and it get back to normal randomly as well without touching anything (sometime after 10seconds, sometime after 15 mins and sometimes after hrs).
I have searched through the logs on the host but I can't see anything related to this lost connection.
Any specific log file that I can check? I checked hostd.log, vpxa.log and syslog.log on the hosts.
I followed this KB (VMware Knowledge Base) but it's not an SSL Timeout issue.
Tried to restart the hostd and vpxa on the hosts with no luck.
In the VCenter the only error that I see is this but no much info to troubleshoot:
Any Idea on what could be or where I can start to look to get more information on this?
I have VMware ESXi, 6.7.0, 14320388 deployed on a bare metal server in a cloud hosting provider.
Any help or suggestions would be greatly appreciated!
This issue occurs when the UDP heartbeat message sent by ESX/ESXi host is not received by vCenter Server. if vCenter Server does not receive the UDP heartbeat message, it treats the host as not responding. This behavior can be an indication of a congested network between the ESX/ESXi host and vCenter Server.
Follow this VMware KB https://kb.vmware.com/s/article/1005757
thank you very much for the response.
Unfortunately this doesn't solve the issue. I already tried that solution (forgot to mention in my initial post). I tried to set 120 and higher values for the config.vpxd.heartbeat.notRespondingTimeout key but I still have the same issue.
Anything else you can think of as a root of the problem?
Is there any changes happen on ESXi or vCenter? If yes, Please share it so will look into from that prospective. Also,Your vCenter and ESXi host are on the same subnet ?
No changes on both VCenter and ESXi happens and they are not on the same subnet (but there's a good connection between them).
Just as an additional information I ran a ping test for 24hrs between VCenter and the hosts I had 0 packet loss! I can run it again but the network is quite stable also between VMs that are on different networks.
Something similar happened to me, but on a single host. VCenter never saw one of the hosts anymore, but the virtual machines were working fine. After trying various solutions, to solve it it was sufficient to remove it from the vcenter and register it again.
thank you for your feedback. Indeed the only workaround to quickly have the hosts available again (I do this when I have an urgency) is to remove the hosts and re-add them but after a while it happens again. In addition, with that solution I lose the folders that I create in the "VMs and Templates" section so I have to re-create them from scratch.
I have 2 Hosts (6.7 and 6.5) (I'm adding a third one esxi6.7 host this week) and both got disconnected at the same time. They are all on different subnets.
I checked the DNS and both Record A and reverse are correctly configured and both VCenter and Hosts are able to resolve each other.
Checked the DNS configuration and they are using the same DNS Server.
I tried to upgrade VCenter to 22.214.171.124200 hoping that this will solve the issue. I just upgraded and it's stable now...but we ll see in the next couple of days!
Thank you for your help guys