1 2 Previous Next 26 Replies Latest reply on Sep 30, 2015 8:04 AM by sagarmaru

    unresponsive host

    shaheed_a_m Lurker

      Hi

      Since i upgraded to VMWare esxi6 and vCenter 6 i have the following issue:

      The host shows as greyed out and is not responding.

      All the vm's on that host are also greyed out and show disconnected.

      i can connect to the host directly by is unresponsive.

       

      If i reboot the host it reconnects fine and everything works.  this has happened on 2 hosts so far.  one was a hp and the other was an intel.

       

      Any help or input will really be appreciated.

       

      Thanks

        • 1. Re: unresponsive host
          brunofernandez1 Expert

          i would recommend to restart the management agents:

          VMware KB: Restarting the Management agents on an ESXi or ESX host

           

          what happens if you rightclick the disconnected server in vcenter and say reconnect? does ke asks you for credentials?

          maybe he has lost the certificates by upgrading to vpshere 6.0 and now he thinks that this could be another server.

          so you have to  reconnect them manuallly

          • 2. Re: unresponsive host
            shaheed_a_m Lurker

            hi

            i haven't tried to disconnect and reconnect the host.  but i did try to right click and selected connect.  this did nothing.

            when i have the problem again I'll try to restart the management agent.

            • 3. Re: unresponsive host
              schulzman Lurker

              Hello at all!

              I had the same issue some days ago in two different environments. One standalone free ESXi 6.0 Hypervisor and one in a two-node-cluster managed by vCenter-Server-Appliance 6.0.

              I tried to reconnect the host, but i didn't work for me.

              At the DCUI is tried to enter my password, but the Host did not respond. Only the reboot did solve my problem. After that everything was fine.

              I'm running the ESXi 6.0 on a Fujitsu RX200 S6 and RX 200 S7.

               

              Please let me know if there is a fix for this issue.

              Regards,

              schulzman

              • 4. Re: unresponsive host
                RichardBush Hot Shot

                Hi,

                 

                I have had something similar on an upgraded test host, the server would randomly disconnect and a reboot resolved it. Eventually the host wouldn't reconnect to vcenter at all.

                 

                The fix for me was up uninstall the vpxa agent, restart the host then reconnect to vcenter (as though connecting a new host)

                 

                R

                • 5. Re: unresponsive host
                  vijayalka Novice

                  could you please confirm how you uninstall the vpxa agent...

                  • 8. Re: unresponsive host
                    sdnbtech Lurker

                    If you're seeing this in your vmkernel.log at the time of the disconnect it could be related to an issue that will one day be described at the below link (it is not live at this time). We see this after a random amount of time and nothing VMware technical support could do except reboot the host helped.

                     

                    http://kb.vmware.com/kb/2124669

                     

                     

                    vmkernel.log:

                    2015-07-19T08:22:35.552Z cpu0:33257)WARNING: LinNet: netdev_watchdog:3678:

                    NETDEV WATCHDOG: vmnic4: transmit timed out

                    2015-07-19T08:22:35.552Z cpu0:33257)WARNING: at vmkdrivers/src_92/vmklinux_92/vmware/linux_net.c:3707/netdev_watchdog()(inside vmklinux)

                    2015-07-19T08:22:35.552Z cpu0:33257)Backtrace for current CPU #0,worldID=33257, rbp=0x430609af4380

                    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49be10:[0x418029896b4e]vmk_LogBacktraceMessage@vmkernel#nover+0x22 stack: 0x430609af4380, 0

                    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49be30:[0x418029f1e7b7]watchdog_work_cb@com.vmware.driverAPI#9.2+0x27f stack: 0x430609ac3ce

                    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bea0:[0x418029f44a5f]vmklnx_workqueue_callout@com.vmware.driverAPI#9.2+0xd7 stack: 0x4306

                    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bf30:[0x41802984f872]helpFunc@vmkernel#nover+0x4e6 stack: 0x0, 0x430609ac3ce0, 0x27, 0x0,

                    2015-07-19T08:22:35.552Z cpu0:33257)0x4390cf49bfd0:[0x418029a1231e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0, 0x0, 0x0, 0x0,

                    • 9. Re: unresponsive host
                      DSchef Lurker

                      sdnbtech, have you heard or seen any updates on the issue you described?  I haven't been able to get an update on the status of a fix from VMware after about a few weeks after confirming VMware engineering is working on a solution.  A host downgrade to 5.5 was the only recommendation aside from rebooting the 6.0 hosts each time networking drops.

                      • 10. Re: unresponsive host
                        cesprov Enthusiast

                        I seem to be having very similar issues:

                         

                        2015-08-11T11:14:53.340Z cpu23:33256)WARNING: LinNet: netdev_watchdog:3678: NETDEV WATCHDOG: vmnic4: transmit timed out

                        2015-08-11T11:14:53.340Z cpu23:33256)<6>ixgbe 0000:41:00.0: vmnic4: Fake Tx hang detected with timeout of 160 seconds

                         

                        When this happens, both ports on a dual port NIC die at the same time and only a reboot fixes it.  I opened an SR with VMware support with reference back to here and the not-yet-exiting KB posted above and will follow up if/when I hear something back on this.

                        • 11. Re: unresponsive host
                          aragab Novice

                          Troubleshooting a non-responsive host without looking at the logs is not really effective, You can open a service request with VMware.

                          • 12. Re: unresponsive host
                            Jimmy15 Enthusiast

                            share the log details, Without logs it is hard to find root cause. storage might also be the reason. APD recovery issue still unresolved in 6.0.

                            What about VMs on host , are they live when host go unresponsive? Even time sync make host disconnected.

                            • 13. Re: unresponsive host
                              cesprov Enthusiast

                              Confirmed what sdnbtech stated above.  The "transmit timed out" is a known issue.  No ETA on a time frame for release yet, not very forthcoming with details.  Basically was told to downgrade if this issue is affecting me as there is no workaround.  Engineer I spoke to says he sees this at least once a week.

                              • 14. Re: unresponsive host
                                sdnbtech Lurker

                                I checked this morning and there are a few options. 1) Apply a debug build of ESXi that will still be affected by the problem but gather more information for the development team, 2) There is a script that has to be run at each boot of each ESXi server that they believe fixes the issue entirely but can cause performance degradation, 3) Downgrade to 5.5 or below.

                                 

                                My case has now been open 60 days regarding this issue. It's very disappointing.

                                1 2 Previous Next