Sravan_k
Expert
Expert

Host not responding in vCenter

Hi,

rcporto

I am facing one issue with ESXi host, it is disconnecting from vcenter and happened two times, first time it happened 3 months back (I got fixed by restarting host management agent) and second time it happened recently, please let me know how can I resolve this issue permanently.

                                                                                                                                                

[This is production issue]

My environment:

ESXi 6.0 update 2

vCenter 6.0

Note: the VM's that are running on this host are not migrating to other host's in cluster

Thank you,

Vkmr.

13 Replies
AdrianTT
Enthusiast
Enthusiast

Hi Vkmr,

This could be a number of things however to start triaging I would recommend you review the /var/log/vpxa.log on the host (vCenter Agent Logs) for clues as to why the failure occured; I know in the past I have seen this kind of behaviour when the a driver is writing to a ramdisk partition (you can check this by executing esxcli system visorfs ramdisk list from the host) and it fills up but it could be a number of things. If you could post your vpxa.log it might give some further insight.

Please consider marking this answer "correct" or "helpful" if you think your query have been answered correctly. Cheers,
HarjitSB
Enthusiast
Enthusiast

Hello V.Kumar, Disconnection of a Host from vCenter can have multiple reasons, your might be one of them:-

1) Network -

a) Where VMkernal vSwitch showing disconnected. Login to the DCUI and check whether the Hosts is able to successfully perform the self test.

b) Try to reconnect to vCenter if got connected, Check the VLANs observed on the vSwitch adaptors-are these correct- check the event for that period also.

2)- VMs may not migrate when we have different vSwitch for vMotion and the TCP stack is not configured for different vMotion Gateway. Uncheck the vmotion from another switch if any.

3) - Chk the FDM logs for HA configuration.

These are the preliminary steps to drill down the issue hope it may help.

Rgds

-Harjit

vmWARS – A War of Virtualization

Sravan_k
Expert
Expert

Thanks for reply Harjit, I checked network test and found network is good.

0 Kudos
Sravan_k
Expert
Expert

Hi Adrian,

Thanks for your reply, please find the attached log files [please check logs from 16:00 of 26th may].

The output of this command is shown bellow "esxcli system visorfs ramdisk list"

sys-visorfs-ramdisk-list1.PNG

AdrianTT
Enthusiast
Enthusiast

Having a quick look at the vpxa.log I can not see any Exception being thrown by the vpxa on the host or any disconnection events for the agent and it appears that it is running and at the time you mentioned (16:00 26/5/2017) and the disk space appears to be OK (bit hard to read the output); I would recommend that if it happens again that you follow the following KB: Troubleshooting an ESXi/ESX host in non responding state (1003409) | VMware KB to eliminate possible causes whilst the issue is occurring.

Sorry I can't provide any further insights,

Kind regards,

Adrian

Please consider marking this answer "correct" or "helpful" if you think your query have been answered correctly. Cheers,
Sravan_k
Expert
Expert

Thanks for replying Adrian, I am trying to find the root cause for this issue, I will update here if I found it.

Sravan_k
Expert
Expert

Hi Adrian,

Just found one log file saying all path down (APD) please find the below log, any idea why this APD-controller triggered?

017-05-26T19:48:42.392Z: [APDCorrelator] 8481791675898us: [esx.problem.storage.apd.start] Device or filesystem with identifier [naa.514f0c5cfa600010] has entered the All Paths Down state.

AdrianTT
Enthusiast
Enthusiast

Hi,

The log indicates that the hypervisor has not alive paths to the storage device with ID naa.514f0c5cfa600010; this would typically be observed when either;

  • HBA has failed on the hypervisor
  • If using iSCSI the network connectivity has been lost to the storage target (which requires investigation into your network)
  • A driver issue with your storage drivers (network or FC HBA); check the VMWare HCL for your devices and ensure that the firmware levels and driver versions installed match
  • If you are using Fibre-channel storage there may be an issue with the pathing

Hope this helps.,

Adrian

Please consider marking this answer "correct" or "helpful" if you think your query have been answered correctly. Cheers,
SK21090
Enthusiast
Enthusiast

Did you try to re-configure HA for this host?

Please consider awarding points for "Correct" or "Helpful" replies. Thanks....!!! VCAP-DCV | VCP-DCV | VCP-NV | MCSA
Sravan_k
Expert
Expert

No, I performed host restart, it automatically connected to vcenter.

0 Kudos
SK21090
Enthusiast
Enthusiast

if the issue still persist, I will recommend to Right Click on the Host and Reconfigure vSphere HA during off business hours

let me know how it goes

Please consider awarding points for "Correct" or "Helpful" replies. Thanks....!!! VCAP-DCV | VCP-DCV | VCP-NV | MCSA
Sravan_k
Expert
Expert

It got fixed by restarting host.

Thank you,

Vkmr.

0 Kudos
amit3bcrec
Contributor
Contributor

Hi,

I had the same issue, I realize that VXPA hearbeat services under firewall rule was stopped, just restarted the service and I was able to add the host back to vCenter. Please see the screenshot-Screen Shot 2018-04-17 at 5.53.12 PM.png