VMware NSX

 View Only
  • 1.  NSX Application Platform Deployment Failed - 'Registration Failed'

    Posted Jul 12, 2022 10:33 AM

    I’m struggling to deploy the NSX Application Platform in our environment.   The deployment prechecks all pass, but the deployment consistently fails at the ‘Registering Platform’ step.  Admittedly I am a newbie when it comes to Tanzu and k8s, but hoping someone can point me in the right direction.

    I have deployed a Tanzu CE cluster, 3 control plane nodes and 3 worker nodes.  All meeting the spec required to deploy NSX intelligence (16CPUs, 64GB RAM, 1TB Disk).  Kube VIP + antrea is used for networking.

    MetalLB has been configured to provide an entry point for the service name / fqdn.  It has been given a pool of 15 addresses, and I have configured 2 A records to point to the first two addresses from this range:

    Service name - nsx-application-platform.domain.com

    Messaging Service Name - nsx-application-platform-msn.domain.com

    (To be honest, I’m not exactly clear what the ‘messaging service name’ is - it seems new with nsx 3.2.x - I’m also just taking it on faith that the deployment will somehow assign the correct IPs from the metallb pool, to correspond with the A records I have created…..)

    For context, I’ve been using this chap’s guide, and found it very helpful - https://lumberjackwizard.com/2022/03/09/deploying-nsx-application-platform-part-six-metallb/

    Aside from the obvious symptom / error of ‘NSX Application Platform Registration failed’ during deployment, the only other errors I can see are these, which occur on the metallb speaker pods

    Events:

      Type     Reason     Age   From     Message

      ----     ------     ----  ----     -------

      Warning  Unhealthy  45m   kubelet  Liveness probe failed: Get "http://10.50.16.169:7472/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

      Warning  Unhealthy  45m   kubelet  Readiness probe failed: Get "http://10.50.16.169:7472/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

    That said, all the pods in the metallb namespace look like they are running ok:

    >kubectl get pods -n metallb-system

    NAME                          READY   STATUS    RESTARTS   AGE

    controller-66445f859d-589zw   1/1     Running   0          20h

    speaker-c6gqt                 1/1     Running   0          20h

    speaker-dnrbh                 1/1     Running   0          20h

    speaker-ncpcl                 1/1     Running   0          20h

    speaker-qg6zz                 1/1     Running   0          20h

    speaker-qt7mw                 1/1     Running   0          20h

    speaker-r6kgs                 1/1     Running   0          20h

    I appreciate these scenarios are very difficult to diagnose and troubleshoot - but I’d really appreciate any pointers you could throw my way!

    Thanks in advance



  • 2.  RE: NSX Application Platform Deployment Failed - 'Registration Failed'

    Posted Aug 22, 2022 01:55 AM
    I have the same problem as you. After checking the log, I found that the POD of AuthServer prompts NSX Manager certificate error. After updating the CERT Manager certificate to my self-signed certificate, the current POD problem disappears, but the registration still fails. The following error is displayed in the POD log of the trust-Manager: The 2022-08-22 T01:20:05, 297 ERROR [pool - 15 - thread - 1] C.V.N.K.T.C.M.C ertificateEntity $Builder: INTELLIGENCE [nsx@6876 comp="trust-manager" errorCode="XXX940010" level="ERROR" subcomp="trust-manager-common"] Invalid Certificate Chain java.security.cert.CertificateParsingException: signed overrun, bytes = 918 The certificate chain printed from the log found that the last root certificate was truncated. I can't solve this problem. I wonder if it is a Bug