shankarsingh
Enthusiast
Enthusiast

esxi host not responding issue

Jump to solution

Hi 

We have esxi 5.5 U3 Hosts(Cisco B200 M3) where we started facing host not responding issue and hosts are brought by doing rebooting esxi host.(Few hosts automatically re-connect after some time)

I noticed that during not responding state

Host reachable via network though PING

Can login into SSH and read/view logs

But we can’t get df -h  or esxcli storage vmfs extent list output

VMs are reachable via n/w PING.

 

Some one can guide/suggest how to fix this issue.

 

Below are few logs (please find the attached logs as well)

 

vmkernel.log
2020-12-15T15:36:56.471Z cpu17:33770)Res3: 9019: 'semar_1_VMFS11': RC cNum 12525 unlock failure at offset 19182592;attempt 1: Timeout
2020-12-15T15:36:56.514Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG fe flags 3
2020-12-15T15:36:56.518Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 254 status FCPIO_SUCCESS
2020-12-15T15:36:56.518Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.518Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2b flags 3
2020-12-15T15:36:56.522Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 43 status FCPIO_SUCCESS
2020-12-15T15:36:56.522Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.522Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2c flags 3
2020-12-15T15:36:56.526Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 44 status FCPIO_SUCCESS
2020-12-15T15:36:56.526Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.526Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2d flags 3
2020-12-15T15:36:56.530Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 45 status FCPIO_SUCCESS
2020-12-15T15:36:56.530Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.765Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG f4 flags 3
2020-12-15T15:36:56.769Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 244 status FCPIO_SUCCESS
2020-12-15T15:36:56.769Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.769Z cpu26:112153959)NMP: nmp_ThrottleLogForDevice:2457: Cmd 0xf1 (0x413683b8d6c0, 32871) to dev "naa.6006016083b1490082c1d65be93b0c9c" on path "vmhba1:C0:T3:L60" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
2020-12-15T15:36:56.769Z cpu26:112153959)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0xfe, CmdSN 0x20347e4 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2020-12-15T15:36:56.769Z cpu6:33168)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG 2f flags 3
2020-12-15T15:36:56.770Z cpu31:33117)<6>fnic : 1 :: icmnd_cmpl ABTS pending hdr status = FCPIO_SUCCESS sc 0x0x413689641b40 scsi_status 0 residual 0
2020-12-15T15:36:56.773Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 47 status FCPIO_SUCCESS
2020-12-15T15:36:56.773Z cpu6:33168)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.773Z cpu26:32871)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0x28, CmdSN 0x20347e5 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2020-12-15T15:36:56.825Z cpu6:33168)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG 30 flags 3
2020-12-15T15:36:56.825Z cpu31:33117)<6>fnic : 1 :: icmnd_cmpl ABTS pending hdr status = FCPIO_SUCCESS sc 0x0x413689641b40 scsi_status 0 residual 0
2020-12-15T15:36:56.829Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 48 status FCPIO_SUCCESS
2020-12-15T15:36:56.829Z cpu6:33168)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.829Z cpu26:32871)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0x28, CmdSN 0x20347e6 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2020-12-15T15:36:57.271Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x36 TAG ff flags 3
2020-12-15T15:36:57.275Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 255 status FCPIO_SUCCESS
2020-12-15T15:36:57.275Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS

vpxa.log
2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd6f0c0-ceec2968-3717-a0369f024b6d' to 'ds:///vmfs/volumes/5bd6f0c0-ceec2968-3717-a0369f024b6d/'
2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
2020-12-15T15:36:18.485Z [FFE24B70 verbose 'halservices' opID=WFU-975fec7e] [VpxaHalServices] EventsRecorded Event Fired
2020-12-15T15:36:18.485Z [FFE24B70 verbose 'VpxaHalCnxHostagent' opID=WFU-975fec7e] [WaitForUpdatesDone] Starting next WaitForUpdates() call to hostd
2020-12-15T15:36:18.485Z [FFE24B70 verbose 'VpxaHalCnxHostagent' opID=WFU-975fec7e] [WaitForUpdatesDone] Completed callback
2020-12-15T15:36:27.149Z [FFE66B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
2020-12-15T15:36:34.255Z [FFE66B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:36:34.895Z [FFEA8B70 verbose 'vpxavpxaInvtHost'] [VpxaInvtHost] Increment master gen. no to (233463): Event:VpxaHalEvent::CheckQueuedEvents
2020-12-15T15:36:47.150Z [FFE66B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
2020-12-15T15:37:04.259Z [FFE24B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:07.151Z [FFEA8B70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
2020-12-15T15:37:07.151Z [FFEA8B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
2020-12-15T15:37:18.594Z [FFEA8B70 verbose 'VpxProfiler'] [1+] CheckEnvBrowserChanges
2020-12-15T15:37:27.153Z [FFEEAB70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
2020-12-15T15:37:27.153Z [FFEEAB70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
2020-12-15T15:37:34.260Z [FFE24B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:34.905Z [FFE66B70 verbose 'vpxavpxaInvtHost'] [VpxaInvtHost] Increment master gen. no to (233464): Event:VpxaHalEvent::CheckQueuedEvents
2020-12-15T15:37:47.156Z [FFEEAB70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
2020-12-15T15:37:47.156Z [FFEEAB70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.

hostd.log


2020-12-15T15:36:18.292Z [82380B70 warning 'Locale' opID=hostd-10ce] No message string to format object vim.option.OptionDef.
-->
2020-12-15T15:36:18.294Z [82380B70 warning 'Locale' opID=hostd-10ce] No message string to format object vim.option.OptionDef.
-->
2020-12-15T15:36:18.357Z [82380B70 verbose 'Default' opID=hostd-10ce] OsfsClient::GetConfigOption: Retrieved receive timeout config option '1200000'
2020-12-15T15:36:18.358Z [82380B70 verbose 'MetadataManager' opID=hostd-10ce] MDMgr : GetConfigOption: Retrieved update timeout config option: '30000'
2020-12-15T15:36:18.371Z [82380B70 verbose 'Hostsvc.NetConfigProvider' opID=hostd-10ce] FetchFn: List of pnics opted out
2020-12-15T15:36:18.415Z [82380B70 verbose 'Default' opID=hostd-10ce] StorageSystemVmkImplProvider: advanced option get key = VMFS.UnresolvedVolumeLiveCheck
2020-12-15T15:36:18.429Z [82380B70 warning 'PropertyCollector' opID=hostd-10ce] ComputeGUReq took 3333061029 microSec
2020-12-15T15:36:18.461Z [82380B70 verbose 'Default' opID=SWI-aae535fe user=vpxuser] AdapterServer: target='vim.ResourcePool:ha-root-pool', method='GetConfig'
2020-12-15T15:36:18.463Z [82380B70 verbose 'Default' opID=SWI-aae535fe user=vpxuser] AdapterServer: target='vim.ResourcePool:ha-root-pool', method='GetName'
2020-12-15T15:36:30.322Z [81642B70 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2020-12-15T15:36:30.362Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:36:35.049Z [FFD6E9A0 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:00.364Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:05.052Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:12.321Z [FFD6E9A0 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 39644MHz/123856MB to 39644MHz/123858MB
2020-12-15T15:37:18.174Z [81940B70 verbose 'Hostsvc.DvsManager'] PersistAllDvsInfo called
2020-12-15T15:37:30.366Z [81BC3B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:35.054Z [81683B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Activation [N5Vmomi10ActivationE:0x814026a8] : Invoke done [waitForUpdatesEx] on [vmodl.query.PropertyCollector:ha-property-collector]
2020-12-15T15:37:45.438Z [81BC3B70 verbose 'Solo.Vmomi' opID=hostd-7b86 user=root] Arg version:
--> "5486"
2020-12-15T15:37:45.438Z [81BC3B70 verbose 'Solo.Vmomi' opID=hostd-7b86 user=root] Arg options:
--> (vmodl.query.PropertyCollector.WaitOptions) {
--> dynamicType = <unset>,
--> maxWaitSeconds = 600,
--> maxObjectUpdates = 100,
--> }
2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Throw vmodl.fault.RequestCanceled
2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Result:
--> (vmodl.fault.RequestCanceled) {
--> dynamicType = <unset>,
--> faultCause = (vmodl.MethodFault) null,
--> msg = "",
--> }
2020-12-15T15:38:00.369Z [81BC3B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:38:01.720Z [81140B70 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2020-12-15T15:38:05.057Z [82380B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:38:12.324Z [81BC3B70 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 39644MHz/123858MB to 39644MHz/123857MB
2020-12-15T15:38:30.372Z [81140B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:38:35.059Z [81140B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:38:58.356Z [816C4B70 warning 'PropertyProvider' opID=hostd-685a] It took 159867141 microseconds to get property summary for vim.VirtualMachine:38


vobd.log

2020-12-15T14:54:43.199Z: No correlator for vob.vmfs.heartbeat.recovered
2020-12-15T14:59:43.330Z: [vmfsCorrelator] 40637642636762us: [esx.problem.vmfs.heartbeat.recovered] 5bd7075c-2535440f-f2bd-e839350048b0
2020-12-15T14:59:43.330Z: No correlator for vob.vmfs.heartbeat.recovered
2020-12-15T15:10:25.838Z: No correlator for vob.vmfs.heartbeat.timedout
2020-12-15T15:10:25.838Z: [vmfsCorrelator] 40638285145350us: [esx.problem.vmfs.heartbeat.timedout] 5bd6f0c0-ceec2968-3717-a0369f024b6d
2020-12-15T15:10:25.839Z: No correlator for vob.vmfs.heartbeat.recovered
2020-12-15T15:10:25.839Z: [vmfsCorrelator] 40638285146352us: [esx.problem.vmfs.heartbeat.recovered] 5bd6f0c0-ceec2968-3717-a0369f024b6d
2020-12-15T15:10:38.884Z: No correlator for vob.vmfs.heartbeat.timedout
2020-12-15T15:10:38.884Z: [vmfsCorrelator] 40638298190997us: [esx.problem.vmfs.heartbeat.timedout] 5bd6f0c0-ceec2968-3717-a0369f024b6d
2020-12-15T15:10:39.291Z: No correlator for vob.vmfs.heartbeat.recovered
2020-12-15T15:10:39.291Z: [vmfsCorrelator] 40638298598424us: [esx.problem.vmfs.heartbeat.recovered] 5bd6f0c0-ceec2968-3717-a0369f024b6d
2020-12-15T15:10:58.741Z: No correlator for vob.vmfs.heartbeat.timedout

 

0 Kudos
1 Solution

Accepted Solutions
NathanosBlightc
Commander
Commander

I saw your attached file that is a brief of all important log files. It seems there is a problem in synchronizing between vpxd (Center) and vpxa (ESXi), So please check the date/time on your hosts and vCenter server. Then you can restart the vpxa and hostd in the ESXi host via running the following command and check the issue once more again. Then check the timeout setting in accordance with KB1017253 If the problem persists, then attach full log files during the problem is occurred.

  • /etc/init.d/vpxa restart
  • /etc/init.d/hostd restart
Please mark my comment as the Correct Answer if this solution resolved your problem

View solution in original post

0 Kudos
5 Replies
scott28tt
VMware Employee
VMware Employee

@shankarsingh 

Moderator: Moved to ESXi Discussions, issue not specific to vCenter.


-------------------------------------------------------------------------------------------------------------------------------------------------------------
VMware Training & Certification blog
0 Kudos
NathanosBlightc
Commander
Commander

While vSphere 5.5 is not supported anymore, I think ESXi 5.5 is not supported by your server model.

https://www.vmware.com/resources/compatibility/search.php?deviceCategory=server&details=1&partner=14...

I think it's better to install at least version 6.0

 

 

Please mark my comment as the Correct Answer if this solution resolved your problem
shankarsingh
Enthusiast
Enthusiast

Hi NathanosBlightc

Yes ,esx 5.5 is not supported anymore ,as we are in planning phases .But now we have to fix current issue as hosts are keep dis-connecting for a while and then re-connect .

When host re-connect ,i tried to vMotion ,but's tasks takes long time and finally it's fails .

Current Cisco HW B200 M3 is supported for esx 5.5 as per Cisco HCI.

So any idea/suggestion what could be cause not unresponsive and fix for host issue to stable to connect VC

Thanks 

0 Kudos
NathanosBlightc
Commander
Commander

I saw your attached file that is a brief of all important log files. It seems there is a problem in synchronizing between vpxd (Center) and vpxa (ESXi), So please check the date/time on your hosts and vCenter server. Then you can restart the vpxa and hostd in the ESXi host via running the following command and check the issue once more again. Then check the timeout setting in accordance with KB1017253 If the problem persists, then attach full log files during the problem is occurred.

  • /etc/init.d/vpxa restart
  • /etc/init.d/hostd restart
Please mark my comment as the Correct Answer if this solution resolved your problem

View solution in original post

0 Kudos
shankarsingh
Enthusiast
Enthusiast

Hi  NathanosBlightc

Thanks for reply and assistance .

We re-configured NTP settings on hosts with correct values (As NTP was stopped state and time was't correct) and then rebooted hosts.

Post NTP change and reboot of hosts, hosts are connected fine with vCenter .

Thanks a lot 

 

 

0 Kudos