VMware Cloud Community
dialsc
Contributor
Contributor

Add ESXi host to cluster results in Out of HTTP sessions

Hello

I'm in the middle of setting up a vSphere test cluster and struggeling with some weird behaviour I cannt get rid of and therefore need help. Here's what I did:

  1. Setup a new vShere Appliance
  2. Added a custom intermediate CA certificate by
    1. /usr/lib/vmware-vmca/bin/certificate-manager -> Option 1 -> create CSR
    2. Signed the request on a MS CA
    3. /usr/lib/vmware-vmca/bin/certificate-manager -> Option 2
    4. Running /usr/lib/vmware-vmca/bin/certificate-manager -> Option 6
  3. Chaned the advanced option "vpxd.certmgmt.certs.minutesBefore" to 10
  4. Added a datacenter
  5. Added a cluster to that datacenter
  6. Added an ESXi host and here the problem starts...

When ever I add the ESXi host by either IP or FQDN it takes about 2 minutes and then the host is marked red with "(Not responding)". Meanwhile I was able to figure out that it is related to fetching vsan data from the host by the VC Appliance.

When ever the problem arises I can see the following log entries:

ESXi host -> /var/log/vpxa.log (dumped into logs repeatedly)

2019-08-12T09:05:59.797Z info vpxa[2141558] [Originator@6876 sub=Default opID=48ae0fe4-bce0-11e9-fa-e9] [VpxLRO] -- ERROR lro-580 -- vsanSystem -- vim.host.VsanSystem.fetchVsanSharedSecret: vim.fault.NotAuthenticated:

--> Result:

--> (vim.fault.NotAuthenticated) {

-->    faultCause = (vmodl.MethodFault) null,

-->    faultMessage = <unset>,

-->    object = 'vim.host.VsanSystem:vsanSystem',

-->    privilegeId = "none"

-->    msg = "Received SOAP response fault from [<cs p:00000056a3f4e600, TCP:localhost:8307>]: fetchVsanSharedSecret

--> The session is not authenticated."

--> }

--> Args:

-->

VC Appliance -> /var/log/vmware/vpxd/vpxd.log (dumped into logs repeatedly)

2019-08-12T11:05:54.868+02:00 info vpxd[05983] [Originator@6876 sub=Default opID=48ae0fe4-bce0-11e9-d8] [VpxLRO] -- ERROR lro-3416 -- vsanSystem-123 -- vim.host.VsanSystem.fetchVsanSharedSecret: vim.fault.NotAuthenticated:

--> Result:

--> (vim.fault.NotAuthenticated) {

-->    faultCause = (vmodl.MethodFault) null,

-->    faultMessage = <unset>,

-->    object = 'vim.host.VsanSystem:vsanSystem',

-->    privilegeId = "none"

-->    msg = "Received SOAP response fault from [<cs p:00007f38946815c0, TCP:10.1.1.60:443>]: fetchVsanSharedSecret

--> Received SOAP response fault from [<cs p:00000056a3f4e600, TCP:localhost:8307>]: fetchVsanSharedSecret

--> The session is not authenticated."

--> }

--> Args:

-->

These "process" seems to be in a loop until the ESXi host starts to log the following entries in -> /var/log/vpxa.log (dumped into logs repeatedly)

2019-08-12T09:18:00.912Z error vpxa[2141558] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 500

Once these entries occur the host is marked as not responding in vSphere Web Client. I can bring it back online by restarting the vpxa service on the ESXi host itself.

So far I was able to figure out that this problem can be triggered by the following steps, each works individually:

  • After adding the host to the cluster the Web UI switches to "vSphere Appliance" -> Datacenter -> Cluster -> Configure -> Configuration -> Quickstart
    BTW: In the box "2. Add hosts" the first "check" -> Time is synchronized across hosts and VC -> The icons remains a circle and does not become a green check.

  • Navigate the Web UI to "vSphere Appliance" -> Datacenter -> Cluster -> Updates

  • Navigate the Web UI to "vSphere Appliance" -> Datacenter -> Cluster -> Configure -> vSAN -> Services

I'm able to workaround this problem by stopping the vSAN health Service in the Appliance Management Web UI but I guess that's not the final sollution... Smiley Wink

The environment:

Started with:

  • ESXi 6.7.0 13006603
  • vCenter Appliance 6.7.0.30000 (13010631)

I patched the servers to the latest versions which are by now:

  • ESXi 6.7.0 13981272
  • vCenter Appliance 6.7.0.32000 (14070457)

Both servers are in the same VLAN and using an IP of the same subnet, the network is working. Also DNS is setup properly, name resultion works both ways. Time settings are also pointing to the same ntp servers.

Well, here we go, I'm lost and cannot get rid of this problem. Anybody has got an idea what the problem could be?

Thanks and regards

Reply
0 Kudos
7 Replies
msripada
Virtuoso
Virtuoso

something on vpxa is taking 500 sessions? I highly doubt a monitoring tool or backup appliance in the environment

can you verify if anything is hitting ESXi host

Thanks,

MS

Reply
0 Kudos
dialsc
Contributor
Contributor

There is nothing else hitting the two servers beside them self. It looks like the vCenter Appliance is running an endless loop of trying to collect vsan information from the ESXi host once the problem starts until the ESXi host starts reporting out of http sessions.

Both, the ESXi host as well as the vCenter Appliance are fresh installations without being added to any kind of monitoring, backup, asf.

Thanks,

dialsc

Reply
0 Kudos
msripada
Virtuoso
Virtuoso

You can try stopping the vsan health service on vcenter if you are not using it and see if that helps in adding the host back

Reply
0 Kudos
markbr81
Contributor
Contributor

Hi have more or less the same issue with trying vSAN.

All hosts function well in vCenter. As soon as I click on vSAN in the cluster configuration, one of my hosts will become unavailable and I have to reboot it.

in the vpxa.log on the specific host I get the exact same error loop

-->    msg = "Received SOAP response fault from [<cs p:0000000e0a79c300, TCP:localhost:8307>]: fetchVsanSharedSecret

--> The session is not authenticated."

It happens only one one host which. I already redeployed it, removed it from vCenter etc. Patched it completely etc.

vCenter is on the latest level.

Reply
0 Kudos
markbr81
Contributor
Contributor

A dear colleague pointed me to this one. Seems like an awful workaround but it does do the job.

VMware Knowledge Base

Reply
0 Kudos
rob_blokland
Contributor
Contributor

I have the exact same issue for a few months now. It started when I upgraded my production environment to ESXi 6.7 U2. Hosts seemed to disconnect at random with the following entry flooding the VPXA.log: Out of HTTP sessions: Limited to 500.

Restarting VPXA fixes the issue, but after a while the host disconnects again, for example, when you scan your host for updates or do any other action on that host.

The first host that had the issue was a Horizon host in a vSAN cluster after I updated the host. I migrated the host out of the vSAN cluster and logged a case with VMware. Unfortunately the case still isn't solved. VMware GSS asked me to update the NIC drivers and firmware of the NIC, but without any result.

I also see disconnect issues with regular vSphere hosts without vSAN. It occurs after updating to 6.7 Update 2. Strangest thing is that the disconnects does not occur in my test environment.

Best regards,

Rob

Reply
0 Kudos
rob_blokland
Contributor
Contributor

So I tried a fresh install of 6.7 U2 to see if that would fixed the issue, but immediately after configuring management and adding it to vCenter as a stand-alone host, it disconnected with the same error.

I updated this fresh install to 6.7 Update 3 and that looks stable for now, so we decided to start an update cycle to 6.7 U3 within a few weeks.

Regards, Rob

Reply
0 Kudos