3 Replies Latest reply on Mar 14, 2018 8:24 AM by AGDave

    Control Plane Agent to Controller Down

    AGDave Enthusiast
    vExpert

      Hi all,

       

      I have an interesting issue with a fresh NSX install.

       

      Environment details:

       

      • vCenter 6.5
      • ESXi 6.5
      • NSX 6.4

       

      Two clusters

       

      • Management and Edge (4 hosts)
      • Compute (4 hosts)

       

      I have installed NSX manager, deployed the controller nodes and initially, it looks good. However, from the dashboard, it states I have 4 hosts with communication issues:

       

      These 4 servers are specific to the Management cluster, which, according to NSX manager have been successfully prepped and configured. Digging a bit deeper, it's reporting SSL handshake failures:

       

       

      netcpa logs confirm this as well : 2018-03-14T07:15:06.782Z error netcpa[C86E817700] [Originator@6876 sub=Default] SSL handshake failed on x.x.x.x:0 : error = SSL Exception: error:140000DB:SSL routines:SSL routines:short read

       

      Entries exist for every controller.

       

      There is IP connectivity between host and controller, and Port 1234 is not being blocked.

       

      VTEP and Management are facilitated by separate VLAN's, but these are consistent across both clusters.

       

      There is a VMware KB article that states updating the controller state is a short term fix, but that doesn't work in my environment.

       

      Any ideas?

       

      Thanks,

        • 1. Re: Control Plane Agent to Controller Down
          canero Hot Shot

          If DNS forward and reverse entries are ok and time is synchronized, the problem may be related to the following KB. (It mentions about an upgrade of controllers but even fresh installation errors are similar):

           

          If similar logs are found as the KB, Update controller state may force renew the certificate:

          https://kb.vmware.com/s/article/2151089

           

            navigate to Network & Security > Installation > Management > NSX Manager > Actions > Update Controller State to pick up the new certificate.

          Does the vsm.log contains similar to:

          2017-06-06 17:10:50.785 GMT+00:00 ERROR NVPStatusCheck NvpRestClientManagerImpl:794 - nvp controller node (172.16.0.10) return error org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://172.16.0.10:443/ws.v1/control-cluster/node?fields=cluster_mgmt_listen_addr,uuid,tags": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out

           

          And the controller logs:

           

          • 2017-06-06 18:32:50,347 19123181348 [listener] INFO com.vmware.controller.server.Listener - Accept Connection [ip=172.24.2.26:46115, cnnId=21264] from /172.24.2.26:46115
            2017-06-06 18:32:50,357 19123181358 [reader 3] ERROR com.vmware.controller.server.ssl.SelfSignedX509TrustManager - Unknow chassis certificate: [
            [
            Version: V3
            Subject: CN="VMWare VXLAN Host Certificate host-11573 OU=Nectworking O=VMWare ST=CA C=US"
            Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11
            Key: Sun RSA public key, 2048 bits

            modulus: 22911650522799465929163707326918080254704523027188317203645647153931638466371122064197258058841116911989320009855294745617721779386019557021249605122136935010401
            36836560115024772432023329796195620130983113379731661924922830333592692791543147876405959524921451570805385813377696469386291738246946920048747704248124484079384552745316
            66112531666589757995492441394796111464829401754007815754348273682553447185738440211794264079252464938057216938803523707224061663150480722911564461043934851115967587589348
            39992978266706878205075684179188691037974878624050280597452927405166323249390673946856460750742686036206044340415301
            public exponent: 65537

            Validity: [From: Fri Apr 28 10:14:16 UTC 2017,
            To: Tue Sep 13 10:14:16 UTC 2044]
            Issuer: CN="VMWare VXLAN Host Certificate host-11573 OU=Nectworking O=VMWare ST=CA C=US"
            SerialNumber: [ 015bb40d d45c]

            >2017-06-07T14:28:04.785693+00:00 2017-06-07 14: 28:04,785 19194224947 [reader 1] ERROR com.vmware.controller.server.ssl.SelfSignedX509TrustManager - Unknow chassis certificate: [#012[#012 Version: V3#012 Subject: CN="VMWare VXLAN Host Certificate host-11573 OU=Nectworking O=VMWare ST=CA C=US"#012 Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11#012#012
            Key: Sun RSA public key, 2048 bits#012 modulus: 229116505227994659291637073269180802547045230271883172036456471539316384663711220641972580588411169
            119893200098552947456177217793860195570212496051221369350104013683656011502477243202332979619562013098311337973166192492283033359269279154314787640595..

          Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

           


          Cause


          This issue occurs when the Controller fails to authenticate the certificate of the host causing the handshake to fail.

           

           


          Resolution


          This issue is resolved in VMware NSX for vSphere 6.3.5, available at VMware Downloads.

           

          To work around this issue if you do not want to upgrade,  navigate to Network & Security > Installation > Management > NSX Manager > Actions > Update Controller State to pick up the new certificate.

           

          The following error codes are supported:

           

          1255602: Incomplete Controller Certificate 
          1255603: SSL Handshake Failure
          1255604: Connection Refused 1255605: Keep-alive Timeout
          1255606: SSL Exception 1255607: Bad Message 1255620: Unknown Error
          • 2. Re: Control Plane Agent to Controller Down
            canero Hot Shot

            Think missed the llast sentence about the KB, if this is the same KB, then the issue of certificate handshake could be about the trust store, are the certificates used self signed?

             

            Secure Configuration of NSX 2017

             

            NSX-v 6.3.x - Security Configuration Guide (Published version 2.1)

             

            The NSX Manager uses a Java Keystore to store the certificates it has provisioned. Other NSX components, such as the NSX controllers leverage encrypted  and password protected PEM files to store their certificates.

            • 3. Re: Control Plane Agent to Controller Down
              AGDave Enthusiast
              vExpert

              As previously stated - "There is a VMware KB article that states updating the controller state is a short term fix, but that doesn't work in my environment."

               

              However, I decided to completely re-deploy my controller cluster, no more ssl handshake issues for now...