1 2 3 Previous Next 41 Replies Latest reply on Oct 9, 2019 8:21 AM by msripada

    Strange Host Responsiveness Issues

    isaacwd Lurker

      - Recently upgraded hosts to 6.7 and vCenter to 6.7a

      - Hosts are 'not responding' in vCenter Server

      - Can ping

      - Cannot acess web interface or login via SSH

      - Can access it via console, but after you enter login information and press enter it freezes (cursor is still blinking)

      - If you remove the host from the inventory and shut down a virtual machine on the host it brings everything back online and the host can be re-added to vCenter

      - Four identical hosts, has happened on three of the four (twice on one)

      - The host that had this issue twice now will not come back after trying the above method and is completely unresponsive at the console

        • 1. Re: Strange Host Responsiveness Issues
          Finikiez Master
          vExpert

          And what do you see in vmkenel.log and hostd.log of affected hosts?

           

          How did you perform an apgrade?

          • 2. Re: Strange Host Responsiveness Issues
            isaacwd Lurker

            I updated the hosts via the Update Manger.

             

            I pulled the logs from one of the hosts I was able to get back online. Here's what was in hostd:

             

            --> [context]zKq7AVICAgAAAMKpfAAVaG9zdGQAAHyZNWxpYnZtYWNvcmUuc28AAADAGwBgsBcBWbxkaG9zdGQAAS5JzIKK4QABbGlidmltLXR5cGVzLnNvAANnIA9saWJ2bW9taS5zbwADTCwPA4pKHAMdmRwDAaIcAxlRHAPwZA0DbNoPA3SgHwH148EAJTAoAAM0KAA7DzYEa4AAbGlicHRocmVhZC5zby4wAAXtmg5saWJjLnNvLjYA[/context]

            count_events: starting communication with bmc over ipmi driver

            count_events: GET_SEL_REPO_INFO returned {version: 0x51, count 41, free 15728,add_stamp 1380738318, erase_stamp 1358956536 op_support 2}

            IPMI SEL sync took 0 seconds 0 sel records, last 41

            2018-05-29T09:29:30.273Z error hostd[2099052] [Originator@6876 sub=Cimsvc] IPMI SEL unavailable

            2018-05-29T09:29:30.274Z warning hostd[2099762] [Originator@6876 sub=Hostsvc.VFlashManager opID=e3ea772f] GetVFlashResourceRuntimeInfo: vFlash is not licensed, not supported

            2018-05-29T09:29:59.882Z warning hostd[2099938] [Originator@6876 sub=Hostsvc.Tpm20Provider opID=e3ea776e user=root] Unable to retrieve TPM/TXT status. TPM functionality will be unavailable. Failure reason: Unable to get node: Sysinfo error: Not foundSee VMkernel log for details..

            2018-05-29T09:29:59.918Z error hostd[2099938] [Originator@6876 sub=Hostsvc.VFlashManager opID=e3ea776e user=root] CheckLicense: vFlash is not licensed. error = [N5Vmomi9DataArrayINS_18LocalizableMessageEEE:0x000000b0b88b7180]

            2018-05-29T09:29:59.923Z warning hostd[2099938] [Originator@6876 sub=Hostsvc.Tpm20Provider opID=e3ea776e user=root] Unable to retrieve TPM/TXT status. TPM functionality will be unavailable. Failure reason: Unable to get node: Sysinfo error: Not foundSee VMkernel log for details..

            2018-05-29T09:29:59.964Z warning hostd[2099938] [Originator@6876 sub=Hostsvc.VFlashManager opID=e3ea776e user=root] GetVFlashResourceRuntimeInfo: vFlash is not licensed, not supported

            2018-05-29T09:29:59.968Z warning hostd[2099938] [Originator@6876 sub=Hostsvc.VFlashManager opID=e3ea776e user=root] GetVFlashResourceRuntimeInfo: vFlash is not licensed, not supported

            2018-05-29T09:30:00.032Z warning hostd[2099885] [Originator@6876 sub=Statssvc] Calculated write I/O size 589477 for scsi0:0 is out of range -- 589477,prevBytes = 27990022656 curBytes = 28010064896 prevCommands = 1280828curCommands = 1280862

            2018-05-29T09:30:00.565Z error hostd[2099053] [Originator@6876 sub=PropertyProvider opID=e3ea7773 user=root] Unexpected fault reading property: 000000b0622e1da0, IsSourceAvailable: N5Vmomi5Fault12NotSupported9ExceptionE(Fault cause: vmodl.fault.NotSupported

            --> )

             

            And here's what was in vmkernel:

             

            2018-06-01T21:51:50.858Z cpu4:2386360)MemSchedAdmit: 477: uw.2386360 (827751) extraMin/extraFromParent: 33/33, sioc (809) childEmin/eMinLimit: 14066/14080

            2018-06-01T21:51:50.858Z cpu4:2386360)MemSchedAdmit: 470: Admission failure in path: sioc/storageRM.2386360/uw.2386360

            2018-06-01T21:51:50.858Z cpu4:2386360)MemSchedAdmit: 477: uw.2386360 (827751) extraMin/extraFromParent: 256/256, sioc (809) childEmin/eMinLimit: 14066/14080

            2018-06-01T21:51:50.940Z cpu1:2387625)ScsiVsi: 2899: Can't set the maxPathQueueDepth value to more than device advertised maxPathQueueDepth 128

            • 3. Re: Strange Host Responsiveness Issues
              isaacwd Lurker

              Bump. Two of the hosts have gone into an unresponsive state again.

              • 4. Re: Strange Host Responsiveness Issues
                daphnissov Guru
                Community WarriorsvExpert

                At this point you should be opening a SR to have them investigate.

                • 5. Re: Strange Host Responsiveness Issues
                  pbaideme Lurker

                  Did you get a resolution to this problem?

                   

                  We opened case with Vmware last year and they were unable to find the root cause.

                   

                  We have been battling this for the past year, however since our 6.5 upgrade, and quite intermittent, 6-7 total host.

                   

                  Here is our current environment to compare.

                   

                  ESXi  6.5.0, 8935087

                  Cisco UCS B200 M4 latest drivers and UCS blade package 3.2(3d)

                  nenic - 1.0.16.0

                  fnic - 1.6.0.37

                  Backup software Veeam 9.5.0.1922

                   

                  Thank you,

                  Phil

                  • 6. Re: Strange Host Responsiveness Issues
                    GalNeb Enthusiast

                    I have what sounds like the same issue.  Hosts are non-responsive, VMs seem ok.  1 host is locked up after entering the root password, still on password screen.  Alt-F# keys work but nothing else.  Another host, I got logged on, but once I got to the troubleshooting screen it then locked.  If I can get there, restarting the management agents works but getting there is the problem.  I have tried connecting with powercli, but connect-viserver times out. 

                     

                    sometimes the lockup on the console will suddenly unfreeze on its own and I will then be able to get to the management agent restart and get the host back up.  No clue as to what triggers either the problem, or the console lockup.

                     

                    In my case, I just upgraded to the latest patches of 6.5u2 with the Hyperthreading Mitigation features.  I have set the flag and so far, problems have only happened on hosts that have had the flag set but have not yet rebooted.  It is still too early to tell if this is a coincidence.  I am pushing thru the reboots as fast as I can so as to eliminate this as a factor, I still have 16 hosts to go.  I set the flag via script 3 days ago and am still doing reboots (a weekend intervened).

                    • 7. Re: Strange Host Responsiveness Issues
                      MightyGorilla Novice

                      I'm seeing similar errors (thousands & thousands of them; 8 lines every 30 seconds) and I also have a 6.7 host upgraded from 6.5U2.

                      The host works fine though (mostly). I do have some strange intermittent connectivity issues with a web application running on one of the VMs.

                       

                      This is an HP DL380 Gen9, and the similar errors I'm seeing are:

                      "2018-09-06T15:41:52.976Z cpu10:2099148)MemSchedAdmit: 470: Admission failure in path: nicmgmtd/nicmgmtd.2099148/uw.2099148"

                      "2018-09-06T15:41:52.976Z cpu10:2099148)MemSchedAdmit: 477: uw.2099148 (9114) extraMin/extraFromParent: 117/117, nicmgmtd (806) childEmin/eMinLimit: 2479/2560"

                       

                      Your post is the only thing I hit when searching.

                       

                      I disconnected one of the NIC cards that I was hoping was associated with the errors, and the errors stopped for several hours- but then started back up...

                      • 8. Re: Strange Host Responsiveness Issues
                        dbray925 Novice

                        You are not alone. We have the same issue on newly installed Dell PowerEdge R640 VSAN Ready Nodes, with a clean 6.7 installed from scratch. Some of our CentOS 7 images, latest patches and open-vm-tools, suddenly just start dropping off. The guests and the hosts seem fine, but we have 0 connectivity on certain interfaces. For example, on some, management interfaces will work fine, but services/Internet interfaces drop off and have no connectivity.

                         

                        I've opened a SR, and hope VMware comes back with something soon.

                        • 9. Re: Strange Host Responsiveness Issues
                          vKopp Novice

                          Hello guys,

                           

                          the same issue for 10x our ESXi 6.7 on DL380 Gen10 with vSAN.

                           

                           

                          2018-09-18T13:23:34.015Z cpu25:2100568)MemSchedAdmit: 470: Admission failure in path: nicmgmtd/nicmgmtd.2100568/uw.2100568

                          2018-09-18T13:23:34.015Z cpu25:2100568)MemSchedAdmit: 477: uw.2100568 (12331) extraMin/extraFromParent: 186/186, nicmgmtd (796) childEmin/eMinLimit: 2443/2560

                           

                          About 1-2 lines each second in /var/log/vmkernel.log

                           

                          Any progress on SR / any statements from VMware ?

                           

                          Please share you info.

                           

                          Thx!

                           

                          Regards,

                          JK

                          • 10. Re: Strange Host Responsiveness Issues
                            RajeevVCP4 Hot Shot
                            vExpert

                            Which vendor HBA is there

                            Try to change as 64 Queue Depth and reboot host ,

                             

                            you can follow this KB

                             

                             

                            VMware Knowledge Base

                            • 11. Re: Strange Host Responsiveness Issues
                              MightyGorilla Novice

                              We were using the built-in Broadcom x4 GigT nics, but switched the traffic to an HP FLR 10GigT Intel-based card (simply due to a guess, considering Broadcom's driver track record).

                              I haven't disabled the Broadcom cards entirely, just moved all the traffic to the other nics, but the errors have continued to fill the logs, and we still continue to have intermittent connectivity/responsiveness issues with one of the hosts...

                              • 12. Re: Strange Host Responsiveness Issues
                                MightyGorilla Novice

                                Thanks for the idea Rajeev,

                                     We're not currently using the Nic types mentioned in that KB article.

                                • 13. Re: Strange Host Responsiveness Issues
                                  MightyGorilla Novice

                                  After disabling the embedded Broadcom quad Nic card last Saturday, the "admission failure" messages all stopped that day and have not returned, for what that's worth.

                                  I haven't collected any new feedback from users about the intermittent connectivity issues yet, so I don't know if that helped anything beyond getting rid of log bloat...

                                  1 person found this helpful
                                  • 14. Re: Strange Host Responsiveness Issues
                                    jameswalkervmw Enthusiast
                                    GS Skyline Support vExpertVMware Employees

                                    Hello,

                                     

                                    I found a similar case with "admission failure" messages reported. Can you try disabling netqueue on the card.

                                     

                                    esxcli network nic queue loadbalancer set --rsslb=off -n vmnicX

                                     

                                    Thanks,

                                    James

                                     

                                     

                                    1 2 3 Previous Next