I am facing one issue with ESXi host, it is disconnecting from vcenter and happened two times, first time it happened 3 months back (I got fixed by restarting host management agent) and second time it happened recently, please let me know how can I resolve this issue permanently.
[This is production issue]
ESXi 6.0 update 2
Note: the VM's that are running on this host are not migrating to other host's in cluster
This could be a number of things however to start triaging I would recommend you review the /var/log/vpxa.log on the host (vCenter Agent Logs) for clues as to why the failure occured; I know in the past I have seen this kind of behaviour when the a driver is writing to a ramdisk partition (you can check this by executing esxcli system visorfs ramdisk list from the host) and it fills up but it could be a number of things. If you could post your vpxa.log it might give some further insight.
Hello V.Kumar, Disconnection of a Host from vCenter can have multiple reasons, your might be one of them:-
1) Network -
a) Where VMkernal vSwitch showing disconnected. Login to the DCUI and check whether the Hosts is able to successfully perform the self test.
b) Try to reconnect to vCenter if got connected, Check the VLANs observed on the vSwitch adaptors-are these correct- check the event for that period also.
2)- VMs may not migrate when we have different vSwitch for vMotion and the TCP stack is not configured for different vMotion Gateway. Uncheck the vmotion from another switch if any.
3) - Chk the FDM logs for HA configuration.
These are the preliminary steps to drill down the issue hope it may help.
Thanks for your reply, please find the attached log files [please check logs from 16:00 of 26th may].
The output of this command is shown bellow "esxcli system visorfs ramdisk list"
Having a quick look at the vpxa.log I can not see any Exception being thrown by the vpxa on the host or any disconnection events for the agent and it appears that it is running and at the time you mentioned (16:00 26/5/2017) and the disk space appears to be OK (bit hard to read the output); I would recommend that if it happens again that you follow the following KB: Troubleshooting an ESXi/ESX host in non responding state (1003409) | VMware KB to eliminate possible causes whilst the issue is occurring.
Sorry I can't provide any further insights,
Just found one log file saying all path down (APD) please find the below log, any idea why this APD-controller triggered?
017-05-26T19:48:42.392Z: [APDCorrelator] 8481791675898us: [esx.problem.storage.apd.start] Device or filesystem with identifier [naa.514f0c5cfa600010] has entered the All Paths Down state.
The log indicates that the hypervisor has not alive paths to the storage device with ID naa.514f0c5cfa600010; this would typically be observed when either;
Hope this helps.,
if the issue still persist, I will recommend to Right Click on the Host and Reconfigure vSphere HA during off business hours
let me know how it goes
I had the same issue, I realize that VXPA hearbeat services under firewall rule was stopped, just restarted the service and I was able to add the host back to vCenter. Please see the screenshot-