Hi Experts,
We have vcenter and esxi as below
VMware vCenter Version 6.5.0 Build 4602587 using Windows
VMware vSphere ESXi, 6.5.0, 4564106
(upgraded from vCenter and ESXi 6.0 GA)
Server using Dell R730 and Storage using MD3800f
Recently we found strange behavior.
Vcenter keep sending alerts that he cannot discover one of the esxi, but the duration is around 1-2 minute. Then it comes to connected again as below log
I try to increase the heartbeat as VMware Knowledge Base (the value is 120) but seems the error still continuing.
There is no physical changes (no cables disconnected, no people working around the server)
Could someone advice on this issue ?
Thanks a lot.
Regards.
Please check for any infra changes.. 1-2 minutes disconnect reminds me of port 902 udp blockage or changes in the firewall..
thanks,
MS
Hi,
Just got confirmation from network side, no blocking policy applied.
vcenter and esxi sit on the same subnet.
just to add more, there no issues on the hw (hw log bundle already submittted to dell) and ping is good (there is no ping loss)
check the logs on the esxi host, vmkernel log for instance.
Also keep a ping for example from vCenter to the ESXI to see if there is any intermittent connectivity issue.
Is your host added using DNS? Check connectivity to dns and if possible (would be faster to discard) add the esxi to the hostfile and see what happens.
Hi !
Try to identify false positives by adjusting the trigger and frequency.
Make ping for check physical status of Network ports
Check Cable and Network port on your physical switch
Check if ping run as well to your DNS server.
Regards,
If the physical/logical configuration of networking is ok, so you need to check the ESXi logs with more details. Please investigate the following log files:
cat /var/log/hostd.log | grep -i "error"
cat /var/log/vpxa.log | grep -i "error"
cat /var/log/vmkernel.log | grep -i "error"
cat /var/log/vmksummary.log | grep -i "error"
error is a sample string for filtering the results, you can search for any related keywords too.
Also, check the ESXi compatibility with your physical host in the VMware HCL. I had a similar problem such as you have, because of the inconsistency between the installed ESXi version and not-supported server platform.
Hi Nico,
Yes, I already pinging dns servers, affected host and vcenter simultaneously for 48 hours.
there no loss of ping (100% success).
What did VMware Support say? (assuming you have filed a Support Request)
Hi Scott,
Did a SR 3 days ago with sev 3.
Already told them our preliminarily check (eg: ping, configuration changes, etc) and also shared the error log from the vcenter.
They have not revert back to us yet again.
An update,
Support ask us the check the network side.
I am planning to add more uplinks since current vsphere combine the management and vm traffic.
Thanks.
only one host effected ?
Any specific time or continue getting this alert.
can you attached hostd logs file with host name and time.