b4ndit
Enthusiast
Enthusiast

vcenter and esxi host disconnects intermittent

Hi Experts,

We have vcenter and esxi as below

VMware vCenter Version 6.5.0 Build 4602587 using Windows

VMware vSphere ESXi, 6.5.0, 4564106

(upgraded from vCenter and ESXi 6.0 GA)

Server using Dell R730 and Storage using MD3800f

Recently we found strange behavior.

Vcenter keep sending alerts that he cannot discover one of the esxi, but the duration is around 1-2 minute. Then it comes to connected again as below log

pastedImage_1.png

pastedImage_2.png

I try to increase the heartbeat as VMware Knowledge Base  (the value is 120) but seems the error still continuing.

There is no physical changes (no cables disconnected, no people working around the server)

Could someone advice on this issue ?

Thanks a lot.

Regards.

0 Kudos
12 Replies
msripada
Virtuoso
Virtuoso

Please check for any infra changes.. 1-2 minutes disconnect reminds me of port 902 udp blockage or changes in the firewall..

thanks,

MS

0 Kudos
b4ndit
Enthusiast
Enthusiast

Hi,

Just got confirmation from network side, no blocking policy applied.

vcenter and esxi sit on the same subnet.

just to add more, there no issues on the hw (hw log bundle already submittted to dell) and ping is good (there is no ping loss)

0 Kudos
depping
Leadership
Leadership

check the logs on the esxi host, vmkernel log for instance.

0 Kudos

Also keep a ping for example from vCenter to the ESXI to see if there is any intermittent connectivity issue.

Is your host added using DNS? Check connectivity to dns and if possible (would be faster to discard) add the esxi to the hostfile and see what happens.

Triple VCIX (CMA-NV-DCV) | vExpert | MCSE | CCNA
0 Kudos
maxime9001
Contributor
Contributor

Hi !

Try to identify false positives by adjusting the trigger and frequency.

Make ping for check physical status of Network ports

Check Cable and Network port on your physical switch

Check if ping run as well to your DNS server.

Regards,

0 Kudos
NathanosBlightc
Commander
Commander

If the physical/logical configuration of networking is ok, so you need to check the ESXi logs with more details. Please investigate the following log files:

cat /var/log/hostd.log | grep -i "error"

cat /var/log/vpxa.log | grep -i "error"

cat /var/log/vmkernel.log | grep -i "error"

cat /var/log/vmksummary.log | grep -i "error"

error is a sample string for filtering the results, you can search for any related keywords too.

Also, check the ESXi compatibility with your physical host in the VMware HCL. I had a similar problem such as you have, because of the inconsistency between the installed ESXi version and not-supported server platform.

Please mark my comment as the Correct Answer if this solution resolved your problem
0 Kudos
b4ndit
Enthusiast
Enthusiast

Hi Nico,

Yes, I already pinging dns servers, affected host and vcenter simultaneously for 48 hours.

there no loss of ping (100% success).

0 Kudos
b4ndit
Enthusiast
Enthusiast

Hi Amin,

attached my error log.

only hostd.log that found error string while others log did not find anything strange (error)

I can confirm the host used listed in the VMware HCL.

Anyway, last 2-3 weeks seem okay, but the error start to appear again yesterday.

pastedImage_0.png

0 Kudos
scott28tt
VMware Employee
VMware Employee

What did VMware Support say? (assuming you have filed a Support Request)

0 Kudos
b4ndit
Enthusiast
Enthusiast

Hi Scott,

Did a SR 3 days ago with sev 3.

Already told them our preliminarily check (eg: ping, configuration changes, etc) and also shared the error log from the vcenter.

They have not revert back to us yet again.

0 Kudos
b4ndit
Enthusiast
Enthusiast

An update,

Support ask us the check the network side.

I am planning to add more uplinks since current vsphere combine the management and vm traffic.

Thanks.

0 Kudos
RajeevVCP4
Expert
Expert

only one host effected ?

Any specific time or continue getting this alert.

can you attached hostd logs file with host name and time.

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you
0 Kudos