We have vcenter and esxi as below
VMware vCenter Version 6.5.0 Build 4602587 using Windows
VMware vSphere ESXi, 6.5.0, 4564106
(upgraded from vCenter and ESXi 6.0 GA)
Server using Dell R730 and Storage using MD3800f
Recently we found strange behavior.
Vcenter keep sending alerts that he cannot discover one of the esxi, but the duration is around 1-2 minute. Then it comes to connected again as below log
I try to increase the heartbeat as VMware Knowledge Base (the value is 120) but seems the error still continuing.
There is no physical changes (no cables disconnected, no people working around the server)
Could someone advice on this issue ?
Thanks a lot.
Just got confirmation from network side, no blocking policy applied.
vcenter and esxi sit on the same subnet.
just to add more, there no issues on the hw (hw log bundle already submittted to dell) and ping is good (there is no ping loss)
Also keep a ping for example from vCenter to the ESXI to see if there is any intermittent connectivity issue.
Is your host added using DNS? Check connectivity to dns and if possible (would be faster to discard) add the esxi to the hostfile and see what happens.
Try to identify false positives by adjusting the trigger and frequency.
Make ping for check physical status of Network ports
Check Cable and Network port on your physical switch
Check if ping run as well to your DNS server.
If the physical/logical configuration of networking is ok, so you need to check the ESXi logs with more details. Please investigate the following log files:
cat /var/log/hostd.log | grep -i "error"
cat /var/log/vpxa.log | grep -i "error"
cat /var/log/vmkernel.log | grep -i "error"
cat /var/log/vmksummary.log | grep -i "error"
error is a sample string for filtering the results, you can search for any related keywords too.
Also, check the ESXi compatibility with your physical host in the VMware HCL. I had a similar problem such as you have, because of the inconsistency between the installed ESXi version and not-supported server platform.
Did a SR 3 days ago with sev 3.
Already told them our preliminarily check (eg: ping, configuration changes, etc) and also shared the error log from the vcenter.
They have not revert back to us yet again.
only one host effected ?
Any specific time or continue getting this alert.
can you attached hostd logs file with host name and time.