VMware Cloud Community
spfma
Contributor
Contributor

vSphere 7.0 : ESXI not responding

Hi,

I am setting up a brand new cluster of four ESXi with plenty of resources, 10G network, ...

All the hypervisors have been configured with a script, so there is no differences or missing items between them.

But I have a problem with one of them : if I reboot it, it will not reconnect after completion. And when I do that manually, I still get an warning "Cannot synchronize host". After some time, it goes to "not responding" state but of course the ESXi and VMs are actually running fine.

What does this error really mean ?

NTP seems to be working, all hosts and VCA have the same time.

It is a dedicated infrastructure under test, so there is no network or computing resources overload.

I have disconnected/connected the host, removed it and added again, but it can not come back to a reliable state.

Any idea ?

Regards 

Reply
0 Kudos
4 Replies
bbalido9
Contributor
Contributor

Hi,

Please check hosts, vpxa & vmkernel.log files to understand why host is going to not responding state.

Otherwise please upload files with timestamp to review.

Reply
0 Kudos
spfma
Contributor
Contributor

Hi,

I still haven't found clues, but it reminds me problems I have read about here but I can't find the posts again.

Everything works fine until I reboot the server. Of course it gets disconnected and doesn't connect automatically (should it) when back online. Reboot time is long, as the server features around 700Gb RAM.

If I connect it manually, it will disconnect soon after.

But if I remove it from the inventory, add it again, it will connect and stay connected (I left it like that for a couple of days).

Of course, on the next reboot, same mess ... and only one machine.

I might end reinstalling it, but I would like to understand what's happening.

Reply
0 Kudos
bbalido9
Contributor
Contributor

Hi,

I would advise checking for vpxa.log to understand if the host is losing Heartbeat to vcenter.

Secondly removing the host from vcenter inventory and adding back point to Vcenter database entry where the host is registered and given unique ID; here the issue might be with either stale or duplicate ID pointing to that ESXi host which will require validation or cleanup within vcenter DB.

PS: will require log bundle from both ESXi host and vcenter to understand exactly what's happening when the host is pushed out or get disconnected from vcenter.

I would request to get the log bundle even though if you will be reinstalling the host.

Thanks,

Balido

Reply
0 Kudos
spfma
Contributor
Contributor

Hi,

For the present time, I give up,

I don't see anything helpful in the logs, mostly because I don't really know what error messages mean (some seem to be harmless even if marked as errors).

I have extracted the logs from the export bundle, and anonymized them. But the content is genuine.

Just for science's sake, I decided to restart VS appliance and guess what : everything was all green when I was able to log again. But after rebooting one ESXi, back to reality !

Regards

Reply
0 Kudos