ESXi

 View Only
  • 1.  esxi host not responding issue

    Posted Dec 15, 2020 07:09 PM
      |   view attached

    Hi 

    We have esxi 5.5 U3 Hosts(Cisco B200 M3) where we started facing host not responding issue and hosts are brought by doing rebooting esxi host.(Few hosts automatically re-connect after some time)

    I noticed that during not responding state

    Host reachable via network though PING

    Can login into SSH and read/view logs

    But we can’t get df -h  or esxcli storage vmfs extent list output

    VMs are reachable via n/w PING.

     

    Some one can guide/suggest how to fix this issue.

     

    Below are few logs (please find the attached logs as well)

     

    vmkernel.log
    2020-12-15T15:36:56.471Z cpu17:33770)Res3: 9019: 'semar_1_VMFS11': RC cNum 12525 unlock failure at offset 19182592;attempt 1: Timeout
    2020-12-15T15:36:56.514Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG fe flags 3
    2020-12-15T15:36:56.518Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 254 status FCPIO_SUCCESS
    2020-12-15T15:36:56.518Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
    2020-12-15T15:36:56.518Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2b flags 3
    2020-12-15T15:36:56.522Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 43 status FCPIO_SUCCESS
    2020-12-15T15:36:56.522Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
    2020-12-15T15:36:56.522Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2c flags 3
    2020-12-15T15:36:56.526Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 44 status FCPIO_SUCCESS
    2020-12-15T15:36:56.526Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
    2020-12-15T15:36:56.526Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x33 TAG 2d flags 3
    2020-12-15T15:36:56.530Z cpu31:33157)<7>fnic : 1 :: abts cmpl recd. id 45 status FCPIO_SUCCESS
    2020-12-15T15:36:56.530Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
    2020-12-15T15:36:56.765Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG f4 flags 3
    2020-12-15T15:36:56.769Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 244 status FCPIO_SUCCESS
    2020-12-15T15:36:56.769Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
    2020-12-15T15:36:56.769Z cpu26:112153959)NMP: nmp_ThrottleLogForDevice:2457: Cmd 0xf1 (0x413683b8d6c0, 32871) to dev "naa.6006016083b1490082c1d65be93b0c9c" on path "vmhba1:C0:T3:L60" Failed: H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
    2020-12-15T15:36:56.769Z cpu26:112153959)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0xfe, CmdSN 0x20347e4 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
    2020-12-15T15:36:56.769Z cpu6:33168)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG 2f flags 3
    2020-12-15T15:36:56.770Z cpu31:33117)<6>fnic : 1 :: icmnd_cmpl ABTS pending hdr status = FCPIO_SUCCESS sc 0x0x413689641b40 scsi_status 0 residual 0
    2020-12-15T15:36:56.773Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 47 status FCPIO_SUCCESS
    2020-12-15T15:36:56.773Z cpu6:33168)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
    2020-12-15T15:36:56.773Z cpu26:32871)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0x28, CmdSN 0x20347e5 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
    2020-12-15T15:36:56.825Z cpu6:33168)<7>fnic : 1 :: Abort Cmd called FCID 0xc60220, LUN 0x3c TAG 30 flags 3
    2020-12-15T15:36:56.825Z cpu31:33117)<6>fnic : 1 :: icmnd_cmpl ABTS pending hdr status = FCPIO_SUCCESS sc 0x0x413689641b40 scsi_status 0 residual 0
    2020-12-15T15:36:56.829Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 48 status FCPIO_SUCCESS
    2020-12-15T15:36:56.829Z cpu6:33168)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
    2020-12-15T15:36:56.829Z cpu26:32871)ScsiDeviceIO: 2331: Cmd(0x4136803cee40) 0x28, CmdSN 0x20347e6 from world 32871 to dev "naa.6006016083b1490082c1d65be93b0c9c" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
    2020-12-15T15:36:57.271Z cpu7:33167)<7>fnic : 1 :: Abort Cmd called FCID 0xc60240, LUN 0x36 TAG ff flags 3
    2020-12-15T15:36:57.275Z cpu31:33117)<7>fnic : 1 :: abts cmpl recd. id 255 status FCPIO_SUCCESS
    2020-12-15T15:36:57.275Z cpu7:33167)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS

    vpxa.log
    2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd6f0c0-ceec2968-3717-a0369f024b6d' to 'ds:///vmfs/volumes/5bd6f0c0-ceec2968-3717-a0369f024b6d/'
    2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
    2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
    2020-12-15T15:36:18.472Z [FFE24B70 verbose 'hostdevent' opID=WFU-975fec7e] [VpxaHalEventHostAgent::NormalizeDsArgument] Transated a DatastoreEventArgument MoRef '5bd7075c-2535440f-f2bd-e839350048b0' to 'ds:///vmfs/volumes/5bd7075c-2535440f-f2bd-e839350048b0/'
    2020-12-15T15:36:18.485Z [FFE24B70 verbose 'halservices' opID=WFU-975fec7e] [VpxaHalServices] EventsRecorded Event Fired
    2020-12-15T15:36:18.485Z [FFE24B70 verbose 'VpxaHalCnxHostagent' opID=WFU-975fec7e] [WaitForUpdatesDone] Starting next WaitForUpdates() call to hostd
    2020-12-15T15:36:18.485Z [FFE24B70 verbose 'VpxaHalCnxHostagent' opID=WFU-975fec7e] [WaitForUpdatesDone] Completed callback
    2020-12-15T15:36:27.149Z [FFE66B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
    2020-12-15T15:36:34.255Z [FFE66B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:36:34.895Z [FFEA8B70 verbose 'vpxavpxaInvtHost'] [VpxaInvtHost] Increment master gen. no to (233463): Event:VpxaHalEvent::CheckQueuedEvents
    2020-12-15T15:36:47.150Z [FFE66B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
    2020-12-15T15:37:04.259Z [FFE24B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:37:07.151Z [FFEA8B70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
    2020-12-15T15:37:07.151Z [FFEA8B70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
    2020-12-15T15:37:18.594Z [FFEA8B70 verbose 'VpxProfiler'] [1+] CheckEnvBrowserChanges
    2020-12-15T15:37:27.153Z [FFEEAB70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
    2020-12-15T15:37:27.153Z [FFEEAB70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.
    2020-12-15T15:37:34.260Z [FFE24B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:37:34.905Z [FFE66B70 verbose 'vpxavpxaInvtHost'] [VpxaInvtHost] Increment master gen. no to (233464): Event:VpxaHalEvent::CheckQueuedEvents
    2020-12-15T15:37:47.156Z [FFEEAB70 error 'hostdstats'] [VpxaHalStatsHostagent::QueryHost] Did not get any entity metrics from the host, hence dropping result
    2020-12-15T15:37:47.156Z [FFEEAB70 verbose 'hostdstats'] [PollCurrentStats] Skipping stat update due to stale sample from hostd.

    hostd.log


    2020-12-15T15:36:18.292Z [82380B70 warning 'Locale' opID=hostd-10ce] No message string to format object vim.option.OptionDef.
    -->
    2020-12-15T15:36:18.294Z [82380B70 warning 'Locale' opID=hostd-10ce] No message string to format object vim.option.OptionDef.
    -->
    2020-12-15T15:36:18.357Z [82380B70 verbose 'Default' opID=hostd-10ce] OsfsClient::GetConfigOption: Retrieved receive timeout config option '1200000'
    2020-12-15T15:36:18.358Z [82380B70 verbose 'MetadataManager' opID=hostd-10ce] MDMgr : GetConfigOption: Retrieved update timeout config option: '30000'
    2020-12-15T15:36:18.371Z [82380B70 verbose 'Hostsvc.NetConfigProvider' opID=hostd-10ce] FetchFn: List of pnics opted out
    2020-12-15T15:36:18.415Z [82380B70 verbose 'Default' opID=hostd-10ce] StorageSystemVmkImplProvider: advanced option get key = VMFS.UnresolvedVolumeLiveCheck
    2020-12-15T15:36:18.429Z [82380B70 warning 'PropertyCollector' opID=hostd-10ce] ComputeGUReq took 3333061029 microSec
    2020-12-15T15:36:18.461Z [82380B70 verbose 'Default' opID=SWI-aae535fe user=vpxuser] AdapterServer: target='vim.ResourcePool:ha-root-pool', method='GetConfig'
    2020-12-15T15:36:18.463Z [82380B70 verbose 'Default' opID=SWI-aae535fe user=vpxuser] AdapterServer: target='vim.ResourcePool:ha-root-pool', method='GetName'
    2020-12-15T15:36:30.322Z [81642B70 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
    2020-12-15T15:36:30.362Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:36:35.049Z [FFD6E9A0 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:37:00.364Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:37:05.052Z [81940B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:37:12.321Z [FFD6E9A0 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 39644MHz/123856MB to 39644MHz/123858MB
    2020-12-15T15:37:18.174Z [81940B70 verbose 'Hostsvc.DvsManager'] PersistAllDvsInfo called
    2020-12-15T15:37:30.366Z [81BC3B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:37:35.054Z [81683B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Activation [N5Vmomi10ActivationE:0x814026a8] : Invoke done [waitForUpdatesEx] on [vmodl.query.PropertyCollector:ha-property-collector]
    2020-12-15T15:37:45.438Z [81BC3B70 verbose 'Solo.Vmomi' opID=hostd-7b86 user=root] Arg version:
    --> "5486"
    2020-12-15T15:37:45.438Z [81BC3B70 verbose 'Solo.Vmomi' opID=hostd-7b86 user=root] Arg options:
    --> (vmodl.query.PropertyCollector.WaitOptions) {
    --> dynamicType = <unset>,
    --> maxWaitSeconds = 600,
    --> maxObjectUpdates = 100,
    --> }
    2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Throw vmodl.fault.RequestCanceled
    2020-12-15T15:37:45.438Z [81BC3B70 info 'Solo.Vmomi' opID=hostd-7b86 user=root] Result:
    --> (vmodl.fault.RequestCanceled) {
    --> dynamicType = <unset>,
    --> faultCause = (vmodl.MethodFault) null,
    --> msg = "",
    --> }
    2020-12-15T15:38:00.369Z [81BC3B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:38:01.720Z [81140B70 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
    2020-12-15T15:38:05.057Z [82380B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:38:12.324Z [81BC3B70 verbose 'Hostsvc.ResourcePool ha-root-pool'] Root pool capacity changed from 39644MHz/123858MB to 39644MHz/123857MB
    2020-12-15T15:38:30.372Z [81140B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:38:35.059Z [81140B70 verbose 'SoapAdapter'] Responded to service state request
    2020-12-15T15:38:58.356Z [816C4B70 warning 'PropertyProvider' opID=hostd-685a] It took 159867141 microseconds to get property summary for vim.VirtualMachine:38


    vobd.log

    2020-12-15T14:54:43.199Z: No correlator for vob.vmfs.heartbeat.recovered
    2020-12-15T14:59:43.330Z: [vmfsCorrelator] 40637642636762us: [esx.problem.vmfs.heartbeat.recovered] 5bd7075c-2535440f-f2bd-e839350048b0
    2020-12-15T14:59:43.330Z: No correlator for vob.vmfs.heartbeat.recovered
    2020-12-15T15:10:25.838Z: No correlator for vob.vmfs.heartbeat.timedout
    2020-12-15T15:10:25.838Z: [vmfsCorrelator] 40638285145350us: [esx.problem.vmfs.heartbeat.timedout] 5bd6f0c0-ceec2968-3717-a0369f024b6d
    2020-12-15T15:10:25.839Z: No correlator for vob.vmfs.heartbeat.recovered
    2020-12-15T15:10:25.839Z: [vmfsCorrelator] 40638285146352us: [esx.problem.vmfs.heartbeat.recovered] 5bd6f0c0-ceec2968-3717-a0369f024b6d
    2020-12-15T15:10:38.884Z: No correlator for vob.vmfs.heartbeat.timedout
    2020-12-15T15:10:38.884Z: [vmfsCorrelator] 40638298190997us: [esx.problem.vmfs.heartbeat.timedout] 5bd6f0c0-ceec2968-3717-a0369f024b6d
    2020-12-15T15:10:39.291Z: No correlator for vob.vmfs.heartbeat.recovered
    2020-12-15T15:10:39.291Z: [vmfsCorrelator] 40638298598424us: [esx.problem.vmfs.heartbeat.recovered] 5bd6f0c0-ceec2968-3717-a0369f024b6d
    2020-12-15T15:10:58.741Z: No correlator for vob.vmfs.heartbeat.timedout

     

    Attachment(s)

    zip
    Logs.zip   2 KB 1 version


  • 2.  RE: esxi host not responding issue

    Broadcom Employee
    Posted Dec 15, 2020 11:32 PM

     

    Moderator: Moved to ESXi Discussions, issue not specific to vCenter.



  • 3.  RE: esxi host not responding issue

    Posted Dec 16, 2020 09:05 AM

    While vSphere 5.5 is not supported anymore, I think ESXi 5.5 is not supported by your server model.

    https://www.vmware.com/resources/compatibility/search.php?deviceCategory=server&details=1&partner=146&keyword=B200&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc

    I think it's better to install at least version 6.0

     

     



  • 4.  RE: esxi host not responding issue

    Posted Dec 16, 2020 01:16 PM

    Hi NathanosBlightc

    Yes ,esx 5.5 is not supported anymore ,as we are in planning phases .But now we have to fix current issue as hosts are keep dis-connecting for a while and then re-connect .

    When host re-connect ,i tried to vMotion ,but's tasks takes long time and finally it's fails .

    Current Cisco HW B200 M3 is supported for esx 5.5 as per Cisco HCI.

    So any idea/suggestion what could be cause not unresponsive and fix for host issue to stable to connect VC

    Thanks 



  • 5.  RE: esxi host not responding issue
    Best Answer

    Posted Dec 16, 2020 02:48 PM

    I saw your attached file that is a brief of all important log files. It seems there is a problem in synchronizing between vpxd (Center) and vpxa (ESXi), So please check the date/time on your hosts and vCenter server. Then you can restart the vpxa and hostd in the ESXi host via running the following command and check the issue once more again. Then check the timeout setting in accordance with KB1017253 If the problem persists, then attach full log files during the problem is occurred.

    • /etc/init.d/vpxa restart
    • /etc/init.d/hostd restart


  • 6.  RE: esxi host not responding issue

    Posted Jan 08, 2021 06:41 PM

    Hi  NathanosBlightc

    Thanks for reply and assistance .

    We re-configured NTP settings on hosts with correct values (As NTP was stopped state and time was't correct) and then rebooted hosts.

    Post NTP change and reboot of hosts, hosts are connected fine with vCenter .

    Thanks a lot