Hello
I'm in the middle of setting up a vSphere test cluster and struggeling with some weird behaviour I cannt get rid of and therefore need help. Here's what I did:
When ever I add the ESXi host by either IP or FQDN it takes about 2 minutes and then the host is marked red with "(Not responding)". Meanwhile I was able to figure out that it is related to fetching vsan data from the host by the VC Appliance.
When ever the problem arises I can see the following log entries:
ESXi host -> /var/log/vpxa.log (dumped into logs repeatedly)
2019-08-12T09:05:59.797Z info vpxa[2141558] [Originator@6876 sub=Default opID=48ae0fe4-bce0-11e9-fa-e9] [VpxLRO] -- ERROR lro-580 -- vsanSystem -- vim.host.VsanSystem.fetchVsanSharedSecret: vim.fault.NotAuthenticated:
--> Result:
--> (vim.fault.NotAuthenticated) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>,
--> object = 'vim.host.VsanSystem:vsanSystem',
--> privilegeId = "none"
--> msg = "Received SOAP response fault from [<cs p:00000056a3f4e600, TCP:localhost:8307>]: fetchVsanSharedSecret
--> The session is not authenticated."
--> }
--> Args:
-->
VC Appliance -> /var/log/vmware/vpxd/vpxd.log (dumped into logs repeatedly)
2019-08-12T11:05:54.868+02:00 info vpxd[05983] [Originator@6876 sub=Default opID=48ae0fe4-bce0-11e9-d8] [VpxLRO] -- ERROR lro-3416 -- vsanSystem-123 -- vim.host.VsanSystem.fetchVsanSharedSecret: vim.fault.NotAuthenticated:
--> Result:
--> (vim.fault.NotAuthenticated) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>,
--> object = 'vim.host.VsanSystem:vsanSystem',
--> privilegeId = "none"
--> msg = "Received SOAP response fault from [<cs p:00007f38946815c0, TCP:10.1.1.60:443>]: fetchVsanSharedSecret
--> Received SOAP response fault from [<cs p:00000056a3f4e600, TCP:localhost:8307>]: fetchVsanSharedSecret
--> The session is not authenticated."
--> }
--> Args:
-->
These "process" seems to be in a loop until the ESXi host starts to log the following entries in -> /var/log/vpxa.log (dumped into logs repeatedly)
2019-08-12T09:18:00.912Z error vpxa[2141558] [Originator@6876 sub=HTTP session map] Out of HTTP sessions: Limited to 500
Once these entries occur the host is marked as not responding in vSphere Web Client. I can bring it back online by restarting the vpxa service on the ESXi host itself.
So far I was able to figure out that this problem can be triggered by the following steps, each works individually:
I'm able to workaround this problem by stopping the vSAN health Service in the Appliance Management Web UI but I guess that's not the final sollution...
The environment:
Started with:
I patched the servers to the latest versions which are by now:
Both servers are in the same VLAN and using an IP of the same subnet, the network is working. Also DNS is setup properly, name resultion works both ways. Time settings are also pointing to the same ntp servers.
Well, here we go, I'm lost and cannot get rid of this problem. Anybody has got an idea what the problem could be?
Thanks and regards
something on vpxa is taking 500 sessions? I highly doubt a monitoring tool or backup appliance in the environment
can you verify if anything is hitting ESXi host
Thanks,
MS
There is nothing else hitting the two servers beside them self. It looks like the vCenter Appliance is running an endless loop of trying to collect vsan information from the ESXi host once the problem starts until the ESXi host starts reporting out of http sessions.
Both, the ESXi host as well as the vCenter Appliance are fresh installations without being added to any kind of monitoring, backup, asf.
Thanks,
dialsc
You can try stopping the vsan health service on vcenter if you are not using it and see if that helps in adding the host back
Hi have more or less the same issue with trying vSAN.
All hosts function well in vCenter. As soon as I click on vSAN in the cluster configuration, one of my hosts will become unavailable and I have to reboot it.
in the vpxa.log on the specific host I get the exact same error loop
--> msg = "Received SOAP response fault from [<cs p:0000000e0a79c300, TCP:localhost:8307>]: fetchVsanSharedSecret
--> The session is not authenticated."
It happens only one one host which. I already redeployed it, removed it from vCenter etc. Patched it completely etc.
vCenter is on the latest level.
A dear colleague pointed me to this one. Seems like an awful workaround but it does do the job.
I have the exact same issue for a few months now. It started when I upgraded my production environment to ESXi 6.7 U2. Hosts seemed to disconnect at random with the following entry flooding the VPXA.log: Out of HTTP sessions: Limited to 500.
Restarting VPXA fixes the issue, but after a while the host disconnects again, for example, when you scan your host for updates or do any other action on that host.
The first host that had the issue was a Horizon host in a vSAN cluster after I updated the host. I migrated the host out of the vSAN cluster and logged a case with VMware. Unfortunately the case still isn't solved. VMware GSS asked me to update the NIC drivers and firmware of the NIC, but without any result.
I also see disconnect issues with regular vSphere hosts without vSAN. It occurs after updating to 6.7 Update 2. Strangest thing is that the disconnects does not occur in my test environment.
Best regards,
Rob
So I tried a fresh install of 6.7 U2 to see if that would fixed the issue, but immediately after configuring management and adding it to vCenter as a stand-alone host, it disconnected with the same error.
I updated this fresh install to 6.7 Update 3 and that looks stable for now, so we decided to start an update cycle to 6.7 U3 within a few weeks.
Regards, Rob