goranmw1
Contributor
Contributor

ESXi host connectivity with vCenter Server

I have health check warning in vCenter 6.7 health saying:

This issue occurs when the UDP heartbeat message sent by ESX/ESXi hosts is not received by vCenter Server. If vCenter Server does not receive the UDP heartbeat message, it treats the host as not responding. ESX/ESXi hosts send heartbeats every 10 seconds and vCenter Server has a window of 60 seconds to receive the heartbeats. This behavior can be an indication of a congested network between the ESX/ESXi hosts and vCenter Server. Click the Ask VMware link above for more details and a resolution.

I have checked everything, it seems there are no network connectivity issues.

Which log can help me to find which exact host have this "missing UDP heartbeat message" issue?

Tnx.

0 Kudos
17 Replies
MikeStoica
Expert
Expert

Check this KB

0 Kudos
irvingpop_chef
Contributor
Contributor

which KB, MikeStoica​ ?

0 Kudos
asajm
Expert
Expert

goranmw1

VMware Knowledge Base

If you think your queries have been answered
Marking this response as "Solution " or "Kudo"
ASAJM
0 Kudos
MikeStoica
Expert
Expert

Sorry, this one VMware Knowledge Base

0 Kudos
HappeeDays
Contributor
Contributor

Did you manage to find a resolution? I have a similar issue where the online health monitor under "Network Health Checks" is reporting "ESXi host connectivity with vCenter Server".  I do not get any disconnects from the hosts, and the network connectivity seems fine?

0 Kudos
maslan81
Contributor
Contributor

I have the same problem Smiley Sad Any solution?

0 Kudos
Alex_Romeo
Leadership
Leadership

Hi,

VMware Knowledge Base

Alessandro Romeo

Blog: https://www.aleadmin.it/
0 Kudos
HappeeDays
Contributor
Contributor

As per  VMware Knowledge Base  increasing the heartbeat timeout on the vcentre does resolve the issue, and the warning disappears. This though  masks the issue and does not explain the underlying issue why the additional timeout of 120 instead of default 60 is required? I've got my network guy to see if he can spot anything..

bmstewart
Contributor
Contributor

I have the same exact issue.  I've got 3 different hosts on 3 different networks, but they're all able to talk to each other.

This is a brand new vSphere 6.7 U3 setup with the latest patches.  All hosts are running ESXi 6.7 with the latest patches.

I don't see any issue or loss of connectivity in the vSphere client itself, but the Skyline Health thing always reports the ESXi host connectivity issue.

I've monitored network traffic at one of our firewall appliances, and I see the UDP packets are not being blocked in any way, but I see very few of them.  I certainly do not see one every 10 seconds.  I see one from a given host IP to the VCSA IP about once every 2 minutes.  No, the network is not congested.  They're all in the same rack in the same data center, and they're the only things connected to that rack's switch.

0 Kudos
coopersmith77
Enthusiast
Enthusiast

HappeeDays​ - I agree with your assessment.  I'm having the exact same issue: Phantom Skyline Health Warning? Please let me know if your network guy spots something that might help me with my environment.

0 Kudos
coopersmith77
Enthusiast
Enthusiast

bmstewart​ I have almost the exact same environment (all 3 hosts are on the same network) and am having the exact same issue.  See my discussion post:Phantom Skyline Health Warning?  If you happen to find a solution, please share it with me.

0 Kudos
coopersmith77
Enthusiast
Enthusiast

goranmw1​ - Did you figure out which log to look at? I'm having the exact same issue: Phantom Skyline Health Warning?  I'd really like to know how to resolve this warning.

0 Kudos
HappeeDays
Contributor
Contributor

coopersmith77​ sorry, no cigar unfortunately.  It is an intermittent issue, and at the moment not seen in my environment.  As there's no issue or problem i'm ignoring it (don't think I've said that before!)  

0 Kudos
bmstewart
Contributor
Contributor

coopersmith77

I ended up following the steps at VMware Knowledge Base to set it to a 2 minute interval (a value of 120) instead.

My only guess is that the hosts aren't actually sending heartbeats as frequently or steadily as they should (once every 10 seconds), or that the default timer before throwing an alert isn't actually 1 minute as it should be.

After making this change I've received no more of these alerts.  All I have now are the following:

  • An occasional warning within VAMI about memory usage being high (we chose "tiny", have about 30 VMs and 3 hosts, and VAMI shows up to 83% usage of its 10 GB sometimes, even though the host shows the VCSA only taking up 2.5 GB).  I could add more memory (from 10 to 12 I guess) and maybe specifically allocate more for the vsphere UI service.  (Re: vSphere UI Health Alarm )
  • An occasional stateless sensor alert for memory (gray/unknown, sensor type -1, etc.).  Possibly related to VMware Knowledge Base ? Though I've applied that patch (and did the after patch work on one host) and it didn't help.
  • The host log having an entry complaining about the scratch partition size.  VMware Knowledge Base ​ This one shows up immediately after installing 6.7.  VMware just not update the default scratch partition size for whatever new requirements 6.7 has? Is this going to be something I have to live with unless I redo a host

I mention all of these here simply because they're behaviors I see on a new, clean install of 6.7.  Maybe I should have gone with 6.5.

0 Kudos
coopersmith77
Enthusiast
Enthusiast

HappeeDays​ Okay.  Thanks for your reply.

0 Kudos
coopersmith77
Enthusiast
Enthusiast

bmstewart​  Thank you for your reply.  I'll keep your alerts in mind as I monitor my environment.  Yeah, as far as trying to keep our environments current, I guess that's what we get for living on the edge. Smiley Wink

0 Kudos
BB9193
Contributor
Contributor

I am also having this issue.  We're running a brand new vSAN cluster at 6.7 U3 and after I recently restarted the cluster we started seeing this heartbeat warning.  I'm also seeing the warning about vCenter running low on memory, but we're using the "tiny" recommendation of 2 cores and 10 GB of RAM and we only have about 40 VM's.  I'm not seeing any congestion and we're not even using the cluster for production yet.

Is this just a reporting anomaly and is the ultimate fix just to increase the timeout?

0 Kudos