VMware Cloud Community
pat_allen
Contributor
Contributor

Remote backup of vCSA failing after upgrade

Hi everyone,

I have two vSphere clusters (Foundation in my test lab and Enterprise Plus in production). I have been running daily backups on the vCenter servers since upgrading to 6.5 months ago. I use the bash script in the VMware documentation. Everything has been going great and backups have been successful. The other day I upgraded the vCSA from 6.5u1c to 6.5u1e (6.5.0.14000). The upgrade itself went OK except for the ability to run remote backups.

For my vCSA running in my test lab cluster, nothing has changed. I can still run local backups via the GUI and I can still run remote backups via the script. All is good.

For my vCSA running in my production cluster, I can run backups from the GUI. However when I try to run the bash script, the backup starts and then immediately stops. The backup log contains the start of the backup run and then errors indicating authentication errors: I'm no longer getting scheduled backups from my production cluster.

Starting backup 2018-01-29-08-50-17

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100   482    0   180  100   302    958   1608 --:--:-- --:--:-- --:--:--  1606

{"type":"com.vmware.vapi.std.errors.unauthenticated","value":{"messages":[{"args":[],"default_message":"Unable to authenticate user","id":"vapi.security.authentication.invalid"}]}}

{"type":"com.vmware.vapi.std.errors.unauthenticated","value":{"messages":[{"args":[],"default_message":"Unable to authenticate user","id":"vapi.security.authentication.invalid"}]}}

Backup job completion status:

To eliminate any potential problems with firewalls, etc, I tried running this locally on the vCSA itself. I got the exact same result. I looked at the /var/log/vmware/vapi/endpoint/endpoint.log and noticed that every time I tried to run the script I got java errors and the message "Failed to create child session with session manager com.vmware.cis.session for session Administrator@VSPHERE.LOCAL".  I've attached a copy of the log messages showing the session messages and java errors that I'm getting.

Has anyone had similar problems? Does anyone have any suggestions on how to solve this?

THANKS for any help.

Pat

Reply
0 Kudos
16 Replies
msripada
Virtuoso
Virtuoso

Is it embedded or external vcsa?

can you type hostname on the VCSA and check the output of the name ..

From the logs, it shows administrator@vsphere.local session failing to authenticate which is suspicious.. Can you check if the vapi service is started?

2018-01-29T08:41:11.251-08:00 | WARN  | vAPI-I/O dispatcher-0     | SessionApiSecurityUtil         | Failed to create child session with session manager com.vmware.cis.session for session Administrator@VSPHERE.LOCAL (internal id 1f2a0b7e-5f14-4f63-8286-96422e0b809d, token 1f0b0...).

com.vmware.vapi.std.errors.unauthenticated => {data=<unset>, messages=[com.vmware.vapi.std.localizable_message => {args=[], default_message=Unable to authenticate user, id=vapi.security.authentication.invalid}]}

        at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil.onLoginResult(SessionApiSecurityUtil.java:248)

since you are doing this via scripts, can you check the below thread which has similar issue with scripts if that helps

vCenter6 vapi : failed to create child session with session manager

Thanks,

MS

Reply
0 Kudos
pat_allen
Contributor
Contributor

Thanks, MS...

It's a vCSA with embedded Platform Services.

root@vcprod [ ~ ]# hostname

vcprod.vm.mbari.org

DNS is working OK: If I do a "nslookup vcprod.vm.mbari.org", it resolves correctly.  I can login to the vSphere web interface as administrator@vsphere.local and backups have been working until the upgrade with this account. I had tried earlier running the script with my Active Directory credentials which have admin rights on the vCSA and had the same results.

According to the Admin GUI, the vAPI Endpoint service is running:

pastedImage_1.png

And everything is nice and green on my appliance manager screen.

Thanks for pointing out that other link. Unfortunately that wouldn't apply here. I have the problem even if there are no previous sessions; after a fresh reboot.

Pat

Reply
0 Kudos
msripada
Virtuoso
Virtuoso

Honestly, I am not good with working on the Bash Shell but if you are okay, you can use the PowerCli based way and see if that works..

PowerCLI-Example-Scripts/Backup-VCSA.psm1 at master · vmware/PowerCLI-Example-Scripts · GitHub

VMware PowerCLI Forum - VMware {code}

Thanks,

MS

Reply
0 Kudos
pat_allen
Contributor
Contributor

Good suggestion but I'm getting forgetful in my old age... 🙂

I tried that yesterday afternoon hoping that it would have different results.

C:\local\scripts> .\TestScript3.ps1

Name                           User                           Port

----                           ----                           ----

vcprod.vm.mbari.org            Administrator@VSPHERE.LOCAL    443

Backup-VCSAToFile : A server error occurred: 'com.vmware.vapi.std.errors.unauthenticated': Unable to authenticate user

(Server error id: 'vapi.security.authentication.invalid'). Check $Error[0].Exception.ServerError for more details.

At C:\local\scripts\TestScript3.ps1:115 char:2

+     Backup-VCSAToFile -BackupPassword $BackupPassword -LocationType $ ...

+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException

    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Backup-VCSAToFile

And I still get the same Java errors in the /var/log/vmware/vapi/endpoint/endpoint.log.

Reply
0 Kudos
msripada
Virtuoso
Virtuoso

Interesting... can you try restarting vapi endpoint service on vcenter? and below forum has suggested something similar

vcsa 6.5 trying to automate the backup - authentication failure

Reply
0 Kudos
MeImNot76
Enthusiast
Enthusiast

Hi Pat,

I'm in your same exact situation, I was using successfully a PowerCLI script until the VCSA was upgraded to Version 6.5.0.13000 Build 8024368, I'm seeing this behaviour on three VCSAs which are all on the same version, I have had a ticket open with VMware for 2 months but no luck so far however I've been told that Engineering are investigating the issue.

This is the exact error I get:

A server error occurred: 'com.vmware.vapi.std.errors.unauthenticated': Unable to authenticate user

(Server error id: 'vapi.security.authentication.invalid'). Check $Error[0].Exception.ServerError for more details.

Reply
0 Kudos
powercleee
Contributor
Contributor

I realize this thread is old, but I'm now facing the exact same issue.  I checked your VAPI logs against mine and they're identical.  Did you come up with any resolution?  Unfortunately restarting VAPI services or modifying that authentication_sso.py script doesn't help.

Reply
0 Kudos
thomsonacvs
Contributor
Contributor

I'll add that we are experiencing this same issue.  I've already validated that the previous fix of modifying authentication_sso.py doesn't work (the script is already "correct").  This is a snippet from the vAPI endpoint logs immediately after running connect-cisserver and then a (get-cisservice).get().  Note that not running the method on get-cisservice _does_ work

2019-07-30T15:44:17.177Z | INFO  | sso1                      | BaseSessionImpl                | Created session is valid until 7/31/19 3:44 PM

2019-07-30T15:44:17.177Z | INFO  | sso1                      | MasterSessionManagerImpl       | Created session <INTERNAL USER@DOMAIN> (internal id 4175a85a-db78-47f0-a6c0-6a009befd991, token 484d2...)

2019-07-30T15:45:01.195Z | WARN  | vAPI-I/O dispatcher-0     | SessionApiSecurityUtil         | Failed to create child session with session manager com.vmware.cis.session for session <INTERNAL USER@DOMAIN> (internal id 4175a85a-db78-47f0-a6c0-6a009befd991, token 484d2...).

com.vmware.vapi.std.errors.unauthenticated => {data=<unset>, messages=[com.vmware.vapi.std.localizable_message => {args=[], default_message=Unable to authenticate user, id=vapi.security.authentication.invalid}]}

        at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil.onLoginResult(SessionApiSecurityUtil.java:248)

        at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil.access$500(SessionApiSecurityUtil.java:43)

        at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil$2.setResult(SessionApiSecurityUtil.java:230)

        at com.vmware.vapi.endpoint.auth.impl.SessionApiSecurityUtil$2.setResult(SessionApiSecurityUtil.java:220)

        at com.vmware.vapi.endpoint.session.SessionFacade$1.setResult(SessionFacade.java:108)

        at com.vmware.vapi.endpoint.session.SessionFacade$1.setResult(SessionFacade.java:90)

        at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider$ResponseCallbackImpl.received(JsonApiProvider.java:221)

        at com.vmware.vapi.internal.protocol.client.msg.json.JsonApiProvider$1.received(JsonApiProvider.java:249)

        at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpAsyncClientTransport$SingleResponseConsumer.responseCompleted(ApacheHttpAsyncClientTransport.java:469)

        at com.vmware.vapi.internal.protocol.client.rpc.http.ApacheHttpAsyncClientTransport$DecoratorConsumer.responseCompleted(ApacheHttpAsyncClientTransport.java:333)

        at org.apache.http.impl.nio.client.MainClientExec.responseCompleted(MainClientExec.java:418)

        at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:164)

        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:355)

        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:242)

        at org.apache.http.impl.nio.client.LoggingAsyncRequestExecutor.inputReady(LoggingAsyncRequestExecutor.java:87)

        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:264)

        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:73)

        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:37)

        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:113)

        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:159)

        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:338)

        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:316)

        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:277)

        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:105)

        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:584)

        at java.lang.Thread.run(Thread.java:748)

Reply
0 Kudos
msripada
Virtuoso
Virtuoso

Please check this.. Btw, what version the vcenter/psc is upgraded to?

VMware Knowledge Base

Reply
0 Kudos
thomsonacvs
Contributor
Contributor

We're on 6.5u2g.  I had already compared a working VCSA's authentication_sso.py with a broken VCSA's file and they're identical.  I ran through the steps in that KB article anyway; no luck.  To me it seems like the token is being truncated/not returned for some reason.  We use a slightly modified version of the local backup script provided by VMware.

Backup job id: <THIS IS BLANK>

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100   180    0   180    0     0  12000      0 --:--:-- --:--:-- --:--:-- 12000

This is the output of a broken VCSA.  The Job ID is not returned so the script doesn't know the progress of the backup job (it doesn't kick off at all).  On a working VCSA, there is a GUID for the job ID.

This is in /var/log/messages

2019-07-30T16:02:38.865402+00:00 a001jp024vcs001 cli: twisted Site starting on 8201

2019-07-30T16:02:38.865903+00:00 a001jp024vcs001 cli: twisted Starting factory <twisted.web.server.Site instance at 0x7f3d43568f38>

2019-07-30T16:02:38.866196+00:00 a001jp024vcs001 cli: vmware.vherd.transport.vapi Listening on 127.0.0.1:8201

2019-07-30T16:02:38.869037+00:00 a001jp024vcs001 cli: vmware.vherd.transport.vapi Cannot bind to ::1:8201

Traceback (most recent call last):

  File "/usr/lib/applmgmt/transport/py/vmware/vherd/transport/vapi.py", line 97, in main

    interface=options.interface6)

  File "/usr/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 478, in listenTCP

    p.startListening()

  File "/usr/lib/python2.7/site-packages/twisted/internet/tcp.py", line 984, in startListening

    raise CannotListenError(self.interface, self.port, le)

CannotListenError: Couldn't listen on ::1:8201: [Errno 99] Cannot assign requested address.

2019-07-30T16:02:39.225994+00:00 a001jp024vcs001 cli: twisted "127.0.0.1" - - [30/Jul/2019:16:02:38 +0000] "POST /rest/com/vmware/appliance/health/applmgmt?~action=get HTTP/1.1" 200 17 "-" "-"

Reply
0 Kudos
msripada
Virtuoso
Virtuoso

can you restart applmgmt service and check if the backup works?

If yes, i will attempt to check a different issue which i had faced

Thanks,

MS

Reply
0 Kudos
thomsonacvs
Contributor
Contributor

Our broken VCSAs all broke around the same time and the only change we can possibly dig up is that the appliances were originally deployed as X-Large which is our standard size for all vCenters (24 vCPU / 64GBs).  The broken appliances were reduced to 16vCPU / 64GB.  All of the broken VCSAs are very small environments so average utilization is a fraction of the capacity but with hyper-threading disabled on some of the management hosts, we needed to turn down the vCPUs

Reply
0 Kudos
thomsonacvs
Contributor
Contributor

No luck unfortunately.  I cycled both the applmgmt service and  vmware-vapi-endpoint.  The script immediately returns a blank job ID

Reply
0 Kudos
thomsonacvs
Contributor
Contributor

I've opened an MCS ticket SR 19018031807 but getting absolutely nowhere with support

Reply
0 Kudos
thomsonacvs
Contributor
Contributor

So I'm almost positive this has something to do with setting a proxy server.  If you set a proxy server, it permanently breaks this functionality (even if it's subsequently removed).  We suffered through the issue and was never resolved by VMware.  We asked them to close the case because we were planning our 6.5 -> 6.7u3a upgrade.  Well, post upgrade the broken vCenters started working as normal.  So, we re-set a proxy configuration on one of the upgraded vCenters and the exact problem is back even after clearing out the proxy config.

New case number 20093948501.

Reply
0 Kudos
timrgsure
Contributor
Contributor

Ever get any progress on this?

Reply
0 Kudos