VMware Cloud Community
andymenlo
Contributor
Contributor

ESXi host disconnects intermittently from vCenter Server Appliance

In "cluster" VMware enable HA with DRS, constantly experience disconnection of the "hosts" of vcenter servers appliance

Versions of Vsphere 6.5 8294253

vcenter appliance 6.5.0.13000

Add the parameter

config.vpxd.heartbeat.notRespondingTimeout = 120

Restart the vcenter services, considering this link

https://kb.vmware.com/s/article/1005757

However, you continue to experience the disconnection of the hosts of vcenter servers

Attached picture that our Server Calfaquen disconnection, but the event occurs with all servers

I would appreciate your support

Sincerely,

Paul

Tags (2)
Reply
0 Kudos
8 Replies
a_p_
Leadership
Leadership

Reply
0 Kudos
rcporto
Leadership
Leadership

Hi Paul,

This is a new deployment? Or old one and the problem only starts occurs now? The ESXi hosts and the vCenter appliance are on the same subnet or the vCSA are on different subnet and behind a firewall?

Check the following VMware KB article for additional settings that you need to investigate: VMware Knowledge Base

---

Richardson Porto
Senior Infrastructure Specialist
LinkedIn: http://linkedin.com/in/richardsonporto
Reply
0 Kudos
SparkRezaRafiee
Enthusiast
Enthusiast

Hi Paul,

I would suggest to check the events for VCSA and see if there was backup, snapshot or any other tasks running on VCSA at the same time that the host became disconnected.

In large and enterprise scale environments it usually happens in wide mangement subnets, network congestion, low heartbeat timeout values and if the VC is too busy with lots of taks in queue.

Also I have seen that issue when crating snapshot or backups of the vCenter Server. Also high storage latency causes that issue as well.

Cheers,

Reza

Reply
0 Kudos
andymenlo
Contributor
Contributor

Hi Reza,

vCENTER Appliance is on a different subnet than the esxi hosts, also modify the vpxa timeout on the Esxi and Vcenter Appliance hosts,

Also modify the teaming to vswitch level, because there was only one active adapter, All configured as active (4 adapter active).

At the vmkernel portgroup level, also modify the teaming and only consider an active adapter and 3 standby adapter.

However, the Esxi continues to be disconnected.

if it is possible that they can support,

Sincerely,

Paulo

Reply
0 Kudos
SparkRezaRafiee
Enthusiast
Enthusiast

Hi Paulo,

I would suggest to check performance graphs of the both the ESXi host and the vCenter server at the occurrence time of the issue and look for potential high latency or high CPU/RAM usage.

Also check the timestamp of the alert and see if there was any backup job running at that time.

If not, then the potential issue can be the layer 3 network connectivity latency especially if you have fairewall doing the interVLAN routing.

To capture the network traffic you can use pktcap-uw --vmk vmk# -o file.pcap on ESXi shell and then open the captured file with WireShark as it is easier to view the contents of the pcap on WireShark.

Also you can check the hostd.log file and look for heartbeats.

Regards,

Reza

Reply
0 Kudos
andymenlo
Contributor
Contributor

Hi Reza,

How is it possible to execute the pktcap-uw command to run in background ?,

Sincerely,

Paulo Méndez L.

Reply
0 Kudos
SparkRezaRafiee
Enthusiast
Enthusiast

Hi Paulo,

You can run the below command and leave the SSH window open:

pktcap-uw --vmk vmk0 -o /tmp/test.pcap

(replace "vmk0" with the vmk# of your management vmk adapter of the ESXi host if it's not VMK0)

The packet capture will continously capture the traffic of the VMKernel onto test.pcap file and when you want to stop it, just press Ctrl-C multiple times. (Do not stop it by Ctrl-Z as it may leave the process running in background that won't release the output file).

Then open the file using Wireshark which is quite user friendly and easy to use.

Cheers,

Reza

Reply
0 Kudos
sk84
Expert
Expert

Does this only affect management communication between hosts and vCenter? Or are the VMs no longer accessible via network, too? Does this affect all hosts at the same time or only one host?

Which physical network card is installed in the hosts? We recently had the case where individual hosts repeatedly lost the complete network connection. We had a faulty driver for the Intel X710 cards (driver i40en and the bug was fixed in version 1.7.1).

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
Reply
0 Kudos