7 Replies Latest reply on Sep 3, 2019 11:31 PM by rob_blokland

    Add ESXi host to cluster results in Out of HTTP sessions

    dialsc Lurker

      Hello

       

      I'm in the middle of setting up a vSphere test cluster and struggeling with some weird behaviour I cannt get rid of and therefore need help. Here's what I did:

       

      1. Setup a new vShere Appliance
      2. Added a custom intermediate CA certificate by
        1. /usr/lib/vmware-vmca/bin/certificate-manager -> Option 1 -> create CSR
        2. Signed the request on a MS CA
        3. /usr/lib/vmware-vmca/bin/certificate-manager -> Option 2
        4. Running /usr/lib/vmware-vmca/bin/certificate-manager -> Option 6
      3. Chaned the advanced option "vpxd.certmgmt.certs.minutesBefore" to 10
      4. Added a datacenter
      5. Added a cluster to that datacenter
      6. Added an ESXi host and here the problem starts...

       

      When ever I add the ESXi host by either IP or FQDN it takes about 2 minutes and then the host is marked red with "(Not responding)". Meanwhile I was able to figure out that it is related to fetching vsan data from the host by the VC Appliance.

       

      When ever the problem arises I can see the following log entries:

       

      ESXi host -> /var/log/vpxa.log (dumped into logs repeatedly)

      2019-08-12T09:05:59.797Z info vpxa[2141558] [Originator@6876 sub=Default opID=48ae0fe4-bce0-11e9-fa-e9] [VpxLRO] -- ERROR lro-580 -- vsanSystem -- vim.host.VsanSystem.fetchVsanSharedSecret: vim.fault.NotAuthenticated:

      --> Result:

      --> (vim.fault.NotAuthenticated) {

      -->    faultCause = (vmodl.MethodFault) null,

      -->    faultMessage = <unset>,

      -->    object = 'vim.host.VsanSystem:vsanSystem',

      -->    privilegeId = "none"

      -->    msg = "Received SOAP response fault from [<cs p:00000056a3f4e600, TCP:localhost:8307>]: fetchVsanSharedSecret

      --> The session is not authenticated."

      --> }

      --> Args:

      -->

       

      VC Appliance -> /var/log/vmware/vpxd/vpxd.log (dumped into logs repeatedly)

      2019-08-12T11:05:54.868+02:00 info vpxd[05983] [Originator@6876 sub=Default opID=48ae0fe4-bce0-11e9-d8] [VpxLRO] -- ERROR lro-3416 -- vsanSystem-123 -- vim.host.VsanSystem.fetchVsanSharedSecret: vim.fault.NotAuthenticated:

      --> Result:

      --> (vim.fault.NotAuthenticated) {

      -->    faultCause = (vmodl.MethodFault) null,

      -->    faultMessage = <unset>,

      -->    object = 'vim.host.VsanSystem:vsanSystem',

      -->    privilegeId = "none"

      -->    msg = "Received SOAP response fault from [<cs p:00007f38946815c0, TCP:10.1.1.60:443>]: fetchVsanSharedSecret

      --> Received SOAP response fault from [<cs p:00000056a3f4e600, TCP:localhost:8307>]: fetchVsanSharedSecret

      --> The session is not authenticated."

      --> }

      --> Args:

      -->

       

      These "process" seems to be in a loop until the ESXi host starts to log the following entries in -> /var/log/vpxa.log (dumped into logs repeatedly)

      2019-08-12T09:18:00.912Z error vpxa[2141558] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 500

       

      Once these entries occur the host is marked as not responding in vSphere Web Client. I can bring it back online by restarting the vpxa service on the ESXi host itself.

       

      So far I was able to figure out that this problem can be triggered by the following steps, each works individually:

       

      • After adding the host to the cluster the Web UI switches to "vSphere Appliance" -> Datacenter -> Cluster -> Configure -> Configuration -> Quickstart
        BTW: In the box "2. Add hosts" the first "check" -> Time is synchronized across hosts and VC -> The icons remains a circle and does not become a green check.

      • Navigate the Web UI to "vSphere Appliance" -> Datacenter -> Cluster -> Updates

      • Navigate the Web UI to "vSphere Appliance" -> Datacenter -> Cluster -> Configure -> vSAN -> Services

       

      I'm able to workaround this problem by stopping the vSAN health Service in the Appliance Management Web UI but I guess that's not the final sollution...

       

      The environment:

       

      Started with:

      • ESXi 6.7.0 13006603
      • vCenter Appliance 6.7.0.30000 (13010631)

       

      I patched the servers to the latest versions which are by now:

      • ESXi 6.7.0 13981272
      • vCenter Appliance 6.7.0.32000 (14070457)

       

      Both servers are in the same VLAN and using an IP of the same subnet, the network is working. Also DNS is setup properly, name resultion works both ways. Time settings are also pointing to the same ntp servers.

       

      Well, here we go, I'm lost and cannot get rid of this problem. Anybody has got an idea what the problem could be?

       

      Thanks and regards