Hi
We have esxi 5.5 U3 Hosts(Cisco B200 M3) where we started facing host not responding issue and hosts are brought by doing rebooting esxi host.(Few hosts automatically re-connect after some time)
I noticed that during not responding state
Host reachable via network though PING
Can login into SSH and read/view logs
But we can’t get df -h or esxcli storage vmfs extent list output
VMs are reachable via n/w PING.
Some one can guide/suggest how to fix this issue.
Below are few logs (please find the attached logs as well)
vmkernel.log
2020-12-15T15:36:56.471Z cpu17:33770)Res3: 9019: 'semar_1_VMFS11': RC cNum 12525 unlock failure at offset 19182592;attempt 1: Timeout
2020-12-15T15:36:56.514Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG fe flags 3
2020-12-15T15:36:56.518Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 254 status FCPIO_SUCCESS
2020-12-15T15:36:56.518Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.518Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2b flags 3
2020-12-15T15:36:56.522Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 43 status FCPIO_SUCCESS
2020-12-15T15:36:56.522Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.522Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2c flags 3
2020-12-15T15:36:56.526Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 44 status FCPIO_SUCCESS
2020-12-15T15:36:56.526Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.526Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2d flags 3
2020-12-15T15:36:56.530Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 45 status FCPIO_SUCCESS
2020-12-15T15:36:56.530Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.765Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG f4 flags 3
2020-12-15T15:36:56.769Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 244 status FCPIO_SUCCESS
2020-12-15T15:36:56.769Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.769Z cpu26:112153959)NMP: nmp_ThrottleLogForDevice:2457: Cmd 0xf1 (0x413683b8d6c0, 32871) to dev "naa.6006016083b1490082c1d65be93b0c9c" on path "vmhba1:C0:T3:L60" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
2020-12-15T15:36:56.769Z cpu26:112153959)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0xfe, CmdSN 0x20347e4 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2020-12-15T15:36:56.769Z cpu6:33168)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG 2f flags 3
2020-12-15T15:36:56.770Z cpu31:33117)<6>fnic : 1 :: icmnd_cmpl ABTS pending hdr status = FCPIO_SUCCESS sc 0x0x413689641b40 scsi_status 0 residual 0
2020-12-15T15:36:56.773Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 47 status FCPIO_SUCCESS
2020-12-15T15:36:56.773Z cpu6:33168)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.773Z cpu26:32871)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0x28, CmdSN 0x20347e5 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2020-12-15T15:36:56.825Z cpu6:33168)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG 30 flags 3
2020-12-15T15:36:56.825Z cpu31:33117)<6>fnic : 1 :: icmnd_cmpl ABTS pending hdr status = FCPIO_SUCCESS sc 0x0x413689641b40 scsi_status 0 residual 0
2020-12-15T15:36:56.829Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 48 status FCPIO_SUCCESS
2020-12-15T15:36:56.829Z cpu6:33168)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
2020-12-15T15:36:56.829Z cpu26:32871)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0x28, CmdSN 0x20347e6 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2020-12-15T15:36:57.271Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x36 TAG ff flags 3
2020-12-15T15:36:57.275Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 255 status FCPIO_SUCCESS
2020-12-15T15:36:57.275Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
vpxa.log
2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd6f0c0-ceec2968-3717-a0369f024b6d' to 'ds:///vmfs/volumes/5bd6f0c0-ceec2968-3717-a0369f024b6d/'
2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
2020-12-15T15:36:18.485Z [FFE24B70 verbose 'halservices' opID=WFU-975fec7e] [VpxaHalServices] EventsRecorded Event Fired
2020-12-15T15:36:18.485Z [FFE24B70 verbose 'VpxaHalCnxHostagent' opID=WFU-975fec7e] [WaitForUpdatesDone] Starting next WaitForUpdates() call to hostd
2020-12-15T15:36:18.485Z [FFE24B70 verbose 'VpxaHalCnxHostagent' opID=WFU-975fec7e] [WaitForUpdatesDone] Completed callback
2020-12-15T15:36:27.149Z [FFE66B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
2020-12-15T15:36:34.255Z [FFE66B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:36:34.895Z [FFEA8B70 verbose 'vpxavpxaInvtHost'] [VpxaInvtHost] Increment master gen. no to (233463): Event:VpxaHalEvent::CheckQueuedEvents
2020-12-15T15:36:47.150Z [FFE66B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
2020-12-15T15:37:04.259Z [FFE24B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:07.151Z [FFEA8B70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
2020-12-15T15:37:07.151Z [FFEA8B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
2020-12-15T15:37:18.594Z [FFEA8B70 verbose 'VpxProfiler'] [1+] CheckEnvBrowserChanges
2020-12-15T15:37:27.153Z [FFEEAB70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
2020-12-15T15:37:27.153Z [FFEEAB70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
2020-12-15T15:37:34.260Z [FFE24B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:34.905Z [FFE66B70 verbose 'vpxavpxaInvtHost'] [VpxaInvtHost] Increment master gen. no to (233464): Event:VpxaHalEvent::CheckQueuedEvents
2020-12-15T15:37:47.156Z [FFEEAB70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
2020-12-15T15:37:47.156Z [FFEEAB70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
hostd.log
2020-12-15T15:36:18.292Z [82380B70 warning 'Locale' opID=hostd-10ce] No message string to format object vim.option.OptionDef.
-->
2020-12-15T15:36:18.294Z [82380B70 warning 'Locale' opID=hostd-10ce] No message string to format object vim.option.OptionDef.
-->
2020-12-15T15:36:18.357Z [82380B70 verbose 'Default' opID=hostd-10ce] OsfsClient::GetConfigOption: Retrieved receive timeout config option '1200000'
2020-12-15T15:36:18.358Z [82380B70 verbose 'MetadataManager' opID=hostd-10ce] MDMgr : GetConfigOption: Retrieved update timeout config option: '30000'
2020-12-15T15:36:18.371Z [82380B70 verbose 'Hostsvc.NetConfigProvider' opID=hostd-10ce] FetchFn: List of pnics opted out
2020-12-15T15:36:18.415Z [82380B70 verbose 'Default' opID=hostd-10ce] StorageSystemVmkImplProvider: advanced option get key = VMFS.UnresolvedVolumeLiveCheck
2020-12-15T15:36:18.429Z [82380B70 warning 'PropertyCollector' opID=hostd-10ce] ComputeGUReq took 3333061029 microSec
2020-12-15T15:36:18.461Z [82380B70 verbose 'Default' opID=SWI-aae535fe user=vpxuser] AdapterServer: target='vim.ResourcePool:ha-root-pool', method='GetConfig'
2020-12-15T15:36:18.463Z [82380B70 verbose 'Default' opID=SWI-aae535fe user=vpxuser] AdapterServer: target='vim.ResourcePool:ha-root-pool', method='GetName'
2020-12-15T15:36:30.322Z [81642B70 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2020-12-15T15:36:30.362Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:36:35.049Z [FFD6E9A0 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:00.364Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:05.052Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:12.321Z [FFD6E9A0 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 39644MHz/123856MB to 39644MHz/123858MB
2020-12-15T15:37:18.174Z [81940B70 verbose 'Hostsvc.DvsManager'] PersistAllDvsInfo called
2020-12-15T15:37:30.366Z [81BC3B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:35.054Z [81683B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Activation [N5Vmomi10ActivationE:0x814026a8] : Invoke done [waitForUpdatesEx] on [vmodl.query.PropertyCollector:ha-property-collector]
2020-12-15T15:37:45.438Z [81BC3B70 verbose 'Solo.Vmomi' opID=hostd-7b86 user=root] Arg version:
--> "5486"
2020-12-15T15:37:45.438Z [81BC3B70 verbose 'Solo.Vmomi' opID=hostd-7b86 user=root] Arg options:
--> (vmodl.query.PropertyCollector.WaitOptions) {
--> dynamicType = <unset>,
--> maxWaitSeconds = 600,
--> maxObjectUpdates = 100,
--> }
2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Throw vmodl.fault.RequestCanceled
2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Result:
--> (vmodl.fault.RequestCanceled) {
--> dynamicType = <unset>,
--> faultCause = (vmodl.MethodFault) null,
--> msg = "",
--> }
2020-12-15T15:38:00.369Z [81BC3B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:38:01.720Z [81140B70 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2020-12-15T15:38:05.057Z [82380B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:38:12.324Z [81BC3B70 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 39644MHz/123858MB to 39644MHz/123857MB
2020-12-15T15:38:30.372Z [81140B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:38:35.059Z [81140B70 verbose 'SoapAdapter'] Responded to service state request
2020-12-15T15:38:58.356Z [816C4B70 warning 'PropertyProvider' opID=hostd-685a] It took 159867141 microseconds to get property summary for vim.VirtualMachine:38
vobd.log
2020-12-15T14:54:43.199Z: No correlator for vob.vmfs.heartbeat.recovered
2020-12-15T14:59:43.330Z: [vmfsCorrelator] 40637642636762us: [esx.problem.vmfs.heartbeat.recovered] 5bd7075c-2535440f-f2bd-e839350048b0
2020-12-15T14:59:43.330Z: No correlator for vob.vmfs.heartbeat.recovered
2020-12-15T15:10:25.838Z: No correlator for vob.vmfs.heartbeat.timedout
2020-12-15T15:10:25.838Z: [vmfsCorrelator] 40638285145350us: [esx.problem.vmfs.heartbeat.timedout] 5bd6f0c0-ceec2968-3717-a0369f024b6d
2020-12-15T15:10:25.839Z: No correlator for vob.vmfs.heartbeat.recovered
2020-12-15T15:10:25.839Z: [vmfsCorrelator] 40638285146352us: [esx.problem.vmfs.heartbeat.recovered] 5bd6f0c0-ceec2968-3717-a0369f024b6d
2020-12-15T15:10:38.884Z: No correlator for vob.vmfs.heartbeat.timedout
2020-12-15T15:10:38.884Z: [vmfsCorrelator] 40638298190997us: [esx.problem.vmfs.heartbeat.timedout] 5bd6f0c0-ceec2968-3717-a0369f024b6d
2020-12-15T15:10:39.291Z: No correlator for vob.vmfs.heartbeat.recovered
2020-12-15T15:10:39.291Z: [vmfsCorrelator] 40638298598424us: [esx.problem.vmfs.heartbeat.recovered] 5bd6f0c0-ceec2968-3717-a0369f024b6d
2020-12-15T15:10:58.741Z: No correlator for vob.vmfs.heartbeat.timedout