Hi everyone,
I have two vSphere clusters (Foundation in my test lab and Enterprise Plus in production). I have been running daily backups on the vCenter servers since upgrading to 6.5 months ago. I use the bash script in the VMware documentation. Everything has been going great and backups have been successful. The other day I upgraded the vCSA from 6.5u1c to 6.5u1e (6.5.0.14000). The upgrade itself went OK except for the ability to run remote backups.
For my vCSA running in my test lab cluster, nothing has changed. I can still run local backups via the GUI and I can still run remote backups via the script. All is good.
For my vCSA running in my production cluster, I can run backups from the GUI. However when I try to run the bash script, the backup starts and then immediately stops. The backup log contains the start of the backup run and then errors indicating authentication errors: I'm no longer getting scheduled backups from my production cluster.
Starting backup 2018-01-29-08-50-17
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 482 0 180 100 302 958 1608 --:--:-- --:--:-- --:--:-- 1606
{"type":"com.vmware.vapi.std.errors.unauthenticated","value":{"messages":[{"args":[],"default_message":"Unable to authenticate user","id":"vapi.security.authentication.invalid"}]}}
{"type":"com.vmware.vapi.std.errors.unauthenticated","value":{"messages":[{"args":[],"default_message":"Unable to authenticate user","id":"vapi.security.authentication.invalid"}]}}
Backup job completion status:
To eliminate any potential problems with firewalls, etc, I tried running this locally on the vCSA itself. I got the exact same result. I looked at the /var/log/vmware/vapi/endpoint/endpoint.log and noticed that every time I tried to run the script I got java errors and the message "Failed to create child session with session manager com.vmware.cis.session for session Administrator@VSPHERE.LOCAL". I've attached a copy of the log messages showing the session messages and java errors that I'm getting.
Has anyone had similar problems? Does anyone have any suggestions on how to solve this?
THANKS for any help.
Pat
Is it embedded or external vcsa?
can you type hostname on the VCSA and check the output of the name ..
From the logs, it shows administrator@vsphere.local session failing to authenticate which is suspicious.. Can you check if the vapi service is started?
2018-01-29T08:41:11.251-08:00 | WARN | vAPI-I/O dispatcher-0 | SessionApiSecurityUtil | Failed to create child session with session manager com.vmware.cis.session for session Administrator@VSPHERE.LOCAL (internal id 1f2a0b7e-5f14-4f63-8286-96422e0b809d, token 1f0b0...).
com.vmware.vapi.std.errors.unauthenticated => {data=<unset>, messages=[com.vmware.vapi.std.localizable_message => {args=[], default_message=Unable to authenticate user, id=vapi.security.authentication.invalid}]}
at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil.onLoginResult(SessionApiSecurityUtil.java:248)
since you are doing this via scripts, can you check the below thread which has similar issue with scripts if that helps
vCenter6 vapi : failed to create child session with session manager
Thanks,
MS
Thanks, MS...
It's a vCSA with embedded Platform Services.
root@vcprod [ ~ ]# hostname
vcprod.vm.mbari.org
DNS is working OK: If I do a "nslookup vcprod.vm.mbari.org", it resolves correctly. I can login to the vSphere web interface as administrator@vsphere.local and backups have been working until the upgrade with this account. I had tried earlier running the script with my Active Directory credentials which have admin rights on the vCSA and had the same results.
According to the Admin GUI, the vAPI Endpoint service is running:
And everything is nice and green on my appliance manager screen.
Thanks for pointing out that other link. Unfortunately that wouldn't apply here. I have the problem even if there are no previous sessions; after a fresh reboot.
Pat
Honestly, I am not good with working on the Bash Shell but if you are okay, you can use the PowerCli based way and see if that works..
PowerCLI-Example-Scripts/Backup-VCSA.psm1 at master · vmware/PowerCLI-Example-Scripts · GitHub
VMware PowerCLI Forum - VMware {code}
Thanks,
MS
Good suggestion but I'm getting forgetful in my old age... 🙂
I tried that yesterday afternoon hoping that it would have different results.
C:\local\scripts> .\TestScript3.ps1
Name User Port
---- ---- ----
vcprod.vm.mbari.org Administrator@VSPHERE.LOCAL 443
Backup-VCSAToFile : A server error occurred: 'com.vmware.vapi.std.errors.unauthenticated': Unable to authenticate user
(Server error id: 'vapi.security.authentication.invalid'). Check $Error[0].Exception.ServerError for more details.
At C:\local\scripts\TestScript3.ps1:115 char:2
+ Backup-VCSAToFile -BackupPassword $BackupPassword -LocationType $ ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Backup-VCSAToFile
And I still get the same Java errors in the /var/log/vmware/vapi/endpoint/endpoint.log.
Interesting... can you try restarting vapi endpoint service on vcenter? and below forum has suggested something similar
vcsa 6.5 trying to automate the backup - authentication failure
Hi Pat,
I'm in your same exact situation, I was using successfully a PowerCLI script until the VCSA was upgraded to Version 6.5.0.13000 Build 8024368, I'm seeing this behaviour on three VCSAs which are all on the same version, I have had a ticket open with VMware for 2 months but no luck so far however I've been told that Engineering are investigating the issue.
This is the exact error I get:
A server error occurred: 'com.vmware.vapi.std.errors.unauthenticated': Unable to authenticate user
(Server error id: 'vapi.security.authentication.invalid'). Check $Error[0].Exception.ServerError for more details.
I realize this thread is old, but I'm now facing the exact same issue. I checked your VAPI logs against mine and they're identical. Did you come up with any resolution? Unfortunately restarting VAPI services or modifying that authentication_sso.py script doesn't help.
I'll add that we are experiencing this same issue. I've already validated that the previous fix of modifying authentication_sso.py doesn't work (the script is already "correct"). This is a snippet from the vAPI endpoint logs immediately after running connect-cisserver and then a (get-cisservice).get(). Note that not running the method on get-cisservice _does_ work
2019-07-30T15:44:17.177Z | INFO | sso1 | BaseSessionImpl | Created session is valid until 7/31/19 3:44 PM
2019-07-30T15:44:17.177Z | INFO | sso1 | MasterSessionManagerImpl | Created session <INTERNAL USER@DOMAIN> (internal id 4175a85a-db78-47f0-a6c0-6a009befd991, token 484d2...)
2019-07-30T15:45:01.195Z | WARN | vAPI-I/O dispatcher-0 | SessionApiSecurityUtil | Failed to create child session with session manager com.vmware.cis.session for session <INTERNAL USER@DOMAIN> (internal id 4175a85a-db78-47f0-a6c0-6a009befd991, token 484d2...).
com.vmware.vapi.std.errors.unauthenticated => {data=<unset>, messages=[com.vmware.vapi.std.localizable_message => {args=[], default_message=Unable to authenticate user, id=vapi.security.authentication.invalid}]}
at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil.onLoginResult(SessionApiSecurityUtil.java:248)
at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil.access$500(SessionApiSecurityUtil.java:43)
at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil$2.setResult(SessionApiSecurityUtil.java:230)
at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil$2.setResult(SessionApiSecurityUtil.java:220)
at com.vmware.vapi.endpoint.session.SessionFacade$1.setResult(SessionFacade.java:108)
at com.vmware.vapi.endpoint.session.SessionFacade$1.setResult(SessionFacade.java:90)
at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider$ResponseCallbackImpl.received(JsonApiProvider.java:221)
at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider$1.received(JsonApiProvider.java:249)
at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpAsyncClientTransport$SingleResponseConsumer.responseCompleted(ApacheHttpAsyncClientTransport.java:469)
at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpAsyncClientTransport$DecoratorConsumer.responseCompleted(ApacheHttpAsyncClientTransport.java:333)
at org.apache.http.impl.nio.client.MainClientExec.responseCompleted(MainClientExec.java:418)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:164)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:355)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:242)
at org.apache.http.impl.nio.client.LoggingAsyncRequestExecutor.inputReady(LoggingAsyncRequestExecutor.java:87)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:264)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:73)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:37)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:113)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:159)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:338)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:316)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:277)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:105)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:584)
at java.lang.Thread.run(Thread.java:748)
Please check this.. Btw, what version the vcenter/psc is upgraded to?
We're on 6.5u2g. I had already compared a working VCSA's authentication_sso.py with a broken VCSA's file and they're identical. I ran through the steps in that KB article anyway; no luck. To me it seems like the token is being truncated/not returned for some reason. We use a slightly modified version of the local backup script provided by VMware.
Backup job id: <THIS IS BLANK>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 180 0 180 0 0 12000 0 --:--:-- --:--:-- --:--:-- 12000
This is the output of a broken VCSA. The Job ID is not returned so the script doesn't know the progress of the backup job (it doesn't kick off at all). On a working VCSA, there is a GUID for the job ID.
This is in /var/log/messages
2019-07-30T16:02:38.865402+00:00 a001jp024vcs001 cli: twisted Site starting on 8201
2019-07-30T16:02:38.865903+00:00 a001jp024vcs001 cli: twisted Starting factory <twisted.web.server.Site instance at 0x7f3d43568f38>
2019-07-30T16:02:38.866196+00:00 a001jp024vcs001 cli: vmware.vherd.transport.vapi Listening on 127.0.0.1:8201
2019-07-30T16:02:38.869037+00:00 a001jp024vcs001 cli: vmware.vherd.transport.vapi Cannot bind to ::1:8201
Traceback (most recent call last):
File "/usr/lib/applmgmt/transport/py/vmware/vherd/transport/vapi.py", line 97, in main
interface=options.interface6)
File "/usr/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 478, in listenTCP
p.startListening()
File "/usr/lib/python2.7/site-packages/twisted/internet/tcp.py", line 984, in startListening
raise CannotListenError(self.interface, self.port, le)
CannotListenError: Couldn't listen on ::1:8201: [Errno 99] Cannot assign requested address.
2019-07-30T16:02:39.225994+00:00 a001jp024vcs001 cli: twisted "127.0.0.1" - - [30/Jul/2019:16:02:38 +0000] "POST /rest/com/vmware/appliance/health/applmgmt?~action=get HTTP/1.1" 200 17 "-" "-"
can you restart applmgmt service and check if the backup works?
If yes, i will attempt to check a different issue which i had faced
Thanks,
MS
Our broken VCSAs all broke around the same time and the only change we can possibly dig up is that the appliances were originally deployed as X-Large which is our standard size for all vCenters (24 vCPU / 64GBs). The broken appliances were reduced to 16vCPU / 64GB. All of the broken VCSAs are very small environments so average utilization is a fraction of the capacity but with hyper-threading disabled on some of the management hosts, we needed to turn down the vCPUs
No luck unfortunately. I cycled both the applmgmt service and vmware-vapi-endpoint. The script immediately returns a blank job ID
I've opened an MCS ticket SR 19018031807 but getting absolutely nowhere with support
So I'm almost positive this has something to do with setting a proxy server. If you set a proxy server, it permanently breaks this functionality (even if it's subsequently removed). We suffered through the issue and was never resolved by VMware. We asked them to close the case because we were planning our 6.5 -> 6.7u3a upgrade. Well, post upgrade the broken vCenters started working as normal. So, we re-set a proxy configuration on one of the upgraded vCenters and the exact problem is back even after clearing out the proxy config.
New case number 20093948501.
Ever get any progress on this?