1 2 Previous Next 18 Replies Latest reply on May 1, 2019 10:52 PM by Sureshkumar M

    esxi host is not responding

    vmisagh Novice

      Hi all

       

      we have an strange problem,

      in VCSA 6 , many of our esxi 6 hosts which have valid IP addresses become "not responding" and after disconnect and reconnect, we get the error say: "request time out"

      I tried most of solutions like, restaring esxi host or restaring vcsa,  regenerating ssl certificare on host, checking 902 and 443 ports on both sides (either host or vcenter) they are open and can also both sides can ping each other. (even disabling firewall totally on host). but didn't resolved. there are only 2 points, when we change IP address of the esxi host with another ip in same subnet, it will add successfully to vcenter but after a few days it became like the previous IP and not responding, same problem... and the second point is when we try to add the problematic host to a secondary vcsa, it will add without problem. my doubt is that main problem should exist on our main VCSA and if some logs are causing problem which ones are safe to delete on vcsa? or anyone can kindly help us with this issue? thanks in advandce

        • 1. Re: esxi host is not responding
          pragg12 Enthusiast

          Hi,

           

          Welcome to VMTN. :-)

           

          I don't think logs on primary VCSA could be causing this. Still, make sure all VCSA partitions have sufficient free space by running this command: df -h

           

          Need information on below before I further suggest anything.

           

          Q1: Are both VCSA in same subnet/VLAN or different VLAN ?

          • 2. Re: esxi host is not responding
            vmisagh Novice

            Hi pragg12, thank you

             

            this is screenshot of (df -h) from primary VCSA

            and answer of your Question1 is no, both vcsa are totally on diffrenet subnets, (diffrent countries)

             

            1.jpg

            • 3. Re: esxi host is not responding
              Sureshkumar M Expert
              vExpert

              Can you give us the IP segment of Primary VCSA, secondary VCSA and ESXi hosts ?

              What is the version of ESXi and VCSA , provide the build number

               

              When the host is in not responding state in VC, have you tried accessing the hostclient https://<ESXi IP of FQDN> from browser, how the host responds that time.

               

              Did you check the vpxd log and hostd log to see why the host goes to unresponsive state ?

               

              vpxd log is located in the vCenter appliance and hostd log located in ESXi, these logs can give us some hint why the host is not responding to VC.

              • 4. Re: esxi host is not responding
                vmisagh Novice

                IP Segments are:

                primary vcsa: 188.40.xx.xx

                secondary vcsa: 46.225.xx.xx

                esxi host: 185.81.xx.xx

                -------------------------------------------

                version and build numbers:

                primary vcsa: 6.00  build 5112529

                secondary vcsa: 6.00 but not sure about exact build number, it is something lower than primary vcsa (255xxxxx)

                esxi host: 6.00 build 3620759

                by the way we also tried to update vcsa and esxi host to other build numbers but nothing changed.

                -------------------------------------------------

                in this currect state I can directly access to esxi host with vsphere client software without any problem.

                ---------------------------------------------------

                I also attached  the last lines of those logs that you wanted. vpxd and hostd

                and there is another maybe usefull screenshot of an error when I try ro reconnect the host in primary vcsa

                 

                2.jpg

                • 5. Re: esxi host is not responding
                  Sureshkumar M Expert
                  vExpert

                  From the screenshot and log message in vpxd.log, this is clear that the host cannot send the heartbeat to VC, this could be due to the network as host and VC are in different segments . There could be high latency at the network end which is causing the issue

                   

                   

                  2019-04-11T07:14:07.458+04:30 error vpxd[7F48EE448700] [Originator@6876 sub=Default] Reading additional bytes from the stream timed out : Read timeout after approximately 305000ms. Closing stream <SSL(<io_obj p:0x00007f4  8cc4b4700, h:171, <TCP '188.40.xx.xx:53482'>, <TCP '185.81.xx.xxx:443'>>)>

                   

                   

                  2019-04-11T08:03:55.918+04:30 error vpxd[7F490D08E700] [Originator@6876 sub=MoHost opID=0AA8F7F3-000079DA-fc][HostMo::Reconnect] Got unexpected exception: Server closed connection after 0 response bytes read; <SSL(<io_obj p:0x00007f48d876edf0, h:97, <TCP '188.40.xx.xx:34320'>, <TCP '185.81.xx.xxx:443'>>)> while reconnecting to host 185.81.xx.xxx-->    reason = "Server closed connection after 0 response bytes read; <SSL(<io_obj p:0x00007f48d876edf0, h:97, <TCP '188.40.xx.xx:34320'>, <TCP '185.81.xx.xxx:443'>>)>",

                   

                  Any specific reason why you your vCenter and hosts are in different network segments ?  Try to bring ESXi and VC on same network if possible.  Following article gives some hint on this issue and resolution steps

                   

                  https://sflanders.net/2013/02/01/host-is-not-responding/

                  • 6. Re: esxi host is not responding
                    vmisagh Novice

                    Thank you for your assessment, actullay our primary vcenter and hosts in addtition to being in different subnets also physically are in different countries becasue we have many hosts accross the europe and USA and Middle East we didn't want to dedicate a vcenter for every country. our primary vCenter is in germany and those problematic hosts are in Iran (Middle East) where it has lower internet quality and network against Europe, and as you mentioned that the most possible cause is latency in network ends, it make sense to me but my Boss says if this is the problem, why our other hosts in Iran in same subnet and same datancenter which are connected to Europe's vCenter, don't have this issue? we have 8 hosts in Iran, which only one of them has this issue and the other one sometimes become not respondig but will come back by itself and the other 6 host haven't had this problem ever until now. do you think there must be another underlying cause? or we must setup a dedicated vCenter in Iran for those host?

                    • 7. Re: esxi host is not responding
                      pragg12 Enthusiast

                      Is there firewall between ESXi and vCenter VLAN networks/IPs?

                      Can you check through your monitoring tools if there is a network packet drop between the affected host and vCenter ?

                      If the affected ESXi host is working with 1 vCenter but not with another then one of the most probable cause would be conflicting or not configured firewall rules to allow the required ports for communication.

                      • 8. Re: esxi host is not responding
                        vmisagh Novice

                        actullay I tested ports like 902 and 443 via telnet from both sides and they are open, there is no firewall in between, and there's a new weird update to my problem that is I installed a new vcenter appliance in same subnet of my primary vcsa ( both are in germany within a same subnet) and problematic esxi like before is in other subnet (in Iran) the things is when I try to add that esxi in primary vcsa I got the error "Request timed out" but when i try to add that host in new vcsa which is exactly in same subnet of primary vcsa , it adds it without any problem. so i'm very suspicious to some logs or anything like that in primary vcsa which prevents from adding that esxi to inventory or maybe there's a logs problem in that host which prevent the host from being added to the primary vcsa. but after searching through logs and delete some of them still not resolved. anyone have an idea?

                        • 9. Re: esxi host is not responding
                          Sureshkumar M Expert
                          vExpert

                          We have to check logs again then, to isolate the issue. Try to reproduce the steps

                           

                          1. connect ESXi host to primary VCSA and capture the vpxd , vpxa and hostd log.

                          2. Connect the same  host to new vcsa and capture the above logs,

                           

                          we can compare and check what is the difference.

                          • 10. Re: esxi host is not responding
                            vmisagh Novice

                            thanks for spending your time on this problem.

                            I did what you said and collected those logs just after getting "request timed out error" on primary vcsa and adding host without problem in secondary vcsa,

                            and for reminder these are my ips right now:

                            primary vcsa: 188.40.xx.50

                            secondary vcsa  : 188.40.xx.45  (same subnet as primary)

                            host : 185.81.xx.125

                            • 11. Re: esxi host is not responding
                              vmisagh Novice

                              allowed maximum file attachments are 5 and these are remained logs

                              • 12. Re: esxi host is not responding
                                Sureshkumar M Expert
                                vExpert

                                This time the communication failed due to SSL issue ,

                                 

                                Primary VCSA logs :

                                 

                                vcenter:/var/log/vmware/vpxd # grep "185.81.xx.125"  vpxd.log

                                2019-04-25T07:50:27.028+04:30 error vpxd[7F9B2D072700] [Originator@6876 sub=HttpConnectionPool-000001] [ConnectComplete] Connect failed to <cs p:00007f9b3c2dbdd0, TCP:185.81.xx.125:443>; cnx: (null), error: N7Vmacore3Ssl18SSLVerifyExceptionE(SSL Exception: Verification parameters:

                                --> ExpectedPeerName: 185.81.xx.125

                                --> "185.81.xx.125"

                                2019-04-25T07:53:42.501+04:30 warning vpxd[7F9B2ECAC700] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007f9b20e933e0, h:-1, <TCP '0.0.0.0:0'>, <TCP '185.81.xx.125:443'>>, e: system:125(Operation canceled)

                                 

                                Matching KB - VMware Knowledge Base

                                • 13. Re: esxi host is not responding
                                  vmisagh Novice

                                  I saw this log and also that KB before and regenerated SSL Certificate on the host but nothing resolved and also if problem is at host's ssl, why it can be add without any ssl problem to secondary VCSA right now? I think issue is at primary VCSA and if there are any ssl certificate cache or something like which must be removed to be able to add that host on  primary VCSA, wthat do you think?

                                  • 14. Re: esxi host is not responding
                                    Sureshkumar M Expert
                                    vExpert

                                    Have you removed the host from primary vCSA before adding it to secondary VCSA or you directly added in secondary VCSA  without removing the host from primary ?

                                     

                                    Try removing the host from primary VCSA completely and add it again, it may resolve the SSL issue as SSL refresh will occur.

                                    1 2 Previous Next