VMware Cloud Community
TARIGX
Enthusiast
Enthusiast

Host connection failure

Dears,,

I am facing a strange issue, i have a VSAN cluster, one of the ESXIs hosts every day appears as "not responding" on the Vcenter, however, i can successfully ping the host and i can access the host DCUI normally through ILO. the host return to the normal state after restart.

Please  guide me if you have any suggestions about how to stop this behavior.

Attached the "event log".

Reply
0 Kudos
8 Replies
dgreebe
Contributor
Contributor

Hi Tarigx

Can you send a ping to the another VSAN host over its VSAN-VMKernel with vmkping ?

Is it only in vCenter "not responding" or are the other vSAN hosts also complaining that the host is gone....

best regards

Dave

Reply
0 Kudos
TARIGX
Enthusiast
Enthusiast

Hi dgreebe

- i did not try vmkping. but i will if it happened again today.

- Yes other hosts are complaining, and HA is not working at the type of not responding message.

Reply
0 Kudos
SureshKumarMuth
Commander
Commander

Hi,

Are you able to login to the host using DCUI or you just see the logon screen?

If the host is not responding it indicates that hostd agent is not responding. Please check the hostd log in the host with the timestamp and check if any backtrace reported also check in /var/core location for any hostd dump file

hostd log location --> /var/log/hostd.log or /scratch/log/  find for the old hostd logs in case the logs are rolled over

Update the hostd log here if you cant find any clue.

Regards,
Suresh
https://vconnectit.wordpress.com/
Reply
0 Kudos
dgreebe
Contributor
Contributor

Tarigx,

Please try the vmkping when that host is according vCenter "not responding".

Try the vSAN vmkernel and also the one where you have configured your "management"

The fact is that the VSAN HA will go over the VSAN kenelport, but the connection with vCenter is over your management-kernel.

If both are using the same NIC, that my advise is to check the drivers and firmware of that NIC  and check if they are on the HCL of VMWare.

Hope to hear from you tomorrow and hopefully with some more information.

When HA is not working to fail over, it seems that there is still some kind of connection i think.

What is the setup of your HA ? What is your FTT of your vSAN ?

Reply
0 Kudos
TARIGX
Enthusiast
Enthusiast

HI,

Yes i am able to login normally through the DCUI.

Reply
0 Kudos
TARIGX
Enthusiast
Enthusiast

Dear Dgreebe

I am facing now the not responding issue, The vmkping is working properly from all othe ESXI's to the affected ESXI. i am using Full automated DRS. and n+1 configuration in VSAN .

Reply
0 Kudos
mark49808
Enthusiast
Enthusiast

Can you share your vpxd.log from the vcenter during a time when the host becomes disconnected? I am facing a similar issue, want to compare your logs to mine.

Reply
0 Kudos
kwg66
Hot Shot
Hot Shot

Same issues here - it is a disaster.   We see these alerts generating constantly, every single night, and nothing in the logs that would indicate what is causing the problem, no backups occurring, log entries are empty and all of the sudden VMs showing as disconnecting, then host not responding, then the disconnect, then sometime later, often times within seconds, but other times up to 40 minutes later, I see "Established a connection"..  

I don't believe the VMs or the host are actually disconnecting from the network, otherwise, we would see other alerts triggered from our monitoring system that has hooks directly into the guest OS and would page our support staff.

I see the KB about increasing the handshakeTimeoutMs value, VMware Knowledge Base

And I agree that in many cases this could work to relieve the alerts, but then again, when these alerts appear in the logs they aren't being triggered and DO NOT even show up as triggered alerts in the web client under triggered alerts..  BOGUS!    

We recently consolidated our vCenter installations from 4 to 2, so there are now more hosts managed under a single vCenter, but none the less, we only have 72 hosts.  this is not a large inventory..  and we deploy vCenter as if we have max size environment... 

Reply
0 Kudos