VMware Cloud Community
David_at_vmware
Contributor
Contributor
Jump to solution

restore VCSA from backup in a mixed SSO environment (6.7 + 7.0)

hi,

it looks like it is not possible to restore a VCSA through the built-in backup/restore functionality in a mixed SSO environment.

SSO domain consists of 6 * vcsa of which 2 are upgraded to 7.0.

one of the vcsa 6.7 crashed and the restore of it from the backup crashes late in the process when trying to join the existing SSO domain.

1 Solution

Accepted Solutions
Lalegre
Virtuoso
Virtuoso
Jump to solution

I tried to find some issues on that logs but i could not find anything. I suggest here to open a support ticket or wait for another user in the community. Use all these evidence to not waste more time.

Sorry that i could not help you buddy. Hope you find the solution and if you find it share it here so maybe you can solve other's issue.

View solution in original post

0 Kudos
11 Replies
scott28tt
VMware Employee
VMware Employee
Jump to solution

Moderator: Thread moved to the area specific to vCenter Server.


-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

Let me see if i got it. You upgraded two VCSA from 6.7 to 7. One of them failed and you want to restore it but it is failing when trying to join SSO domain.

Is this correct? Could you please provide us some screenshots of your errors?

0 Kudos
David_at_vmware
Contributor
Contributor
Jump to solution

we had 5 * vcsa 6.7. then last week, we deployed a new vcsa 7.0 and connected it to the SSO domain.

we upgraded 1 VCSA to version 7.0.

seemingly unrelated 1 of the vcsa 6.7 crashed the next day (showing error 503 after reboot). to not loose to much time we decided to restore a backup from a yesterday.

0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

Mixed version in the same SSO is not supported and it should be only used as a transient configuration when you are doing upgrade. What you did was deploying a new VCSA 7.0 and joined it to the same SSO domain so that action is not supported.

This could be the issue you are facing and I recommend you delete that VCSA and remove it from the SSO domain if you did not do any further configuration. If that is not possible then could you please provide some logs about restore process?

0 Kudos
David_at_vmware
Contributor
Contributor
Jump to solution

hi,

thanks for the reply. we are in this transition phase, but as usual, stuff always breaks at the most inconvenient time.

the join of the vcsa 7.0 was done as part of a Dell VXRail deployment by a certified consultant. I will check with him as well but during the procedure neither he nor the Dell Support technician who assisted us because of some other issues explained concern about this constellation.

as the log package is 670mb of compressed size (and probably contains sensitive information), which files would be most helpful for you?

0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

I was trying to look to which logs you should look but i did not get that information from the official documentation. However i know where the backup logs are located and maybe you can take a look there: /var/log/vmware/applmgmt.

If there are no restore logs there maybe you can look at /var/log/vmware and look for the last updated logs. Even if you finished the Stage 1 you can login using SSH or the VMRC to the VM to see the latest update.

0 Kudos
David_at_vmware
Contributor
Contributor
Jump to solution

i found a lot of these in applmgmt.log

2020-09-03T09:30:43.810 [15175]ERROR:vmware.appliance.update.update_functions:Can't read JSON file /etc/applmgmt/appliance/software_update_state.conf [Errno 2] No such file or directory: '/etc/applmgmt/appliance/software_update_state.conf'

2020-09-03T09:30:43.810 [15175]DEBUG:vmware.appliance.update.update_state:No update state file. Will create a new one

2020-09-03T09:30:43.810 [15175]ERROR:vmware.appliance.update.update_functions:Can't read JSON file /etc/applmgmt/appliance/software_update_state.conf [Errno 2] No such file or directory: '/etc/applmgmt/appliance/software_update_state.conf'

i have attached the sanitized logs out of that folder to this post

0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

Hey,

I can see that the restore process was able to finish but it has these lines inside:

2020-09-03T09:15:14.263 [MainProcess:PID-6711] ERROR: Reconciliation job failed to start. Error: error code: 1

2020-09-03T09:15:14.264 [MainProcess:PID-6711] ERROR: RestoreManager encountered an exception: error code: 1

If you go and see the reconciliation.log you can see the next:

ERROR: Error executing start on service vmware-stsd

Could you login into the failed VM and look into /var/log/vmware and check for the logs of the vmware-stsd service?

0 Kudos
David_at_vmware
Contributor
Contributor
Jump to solution

there you go.

(found in /var/log/vmware/sso/utils )

vmware-stsd.log:

...

2020-09-03 09:28:33 16281: [INFO] BEGIN vmware-stsd start ; caller pid=1 (executable: /usr/lib/systemd/systemd)

2020-09-03 09:28:33 16281: [INFO] Starting service.

2020-09-03 09:28:33 16281: [INFO] [check_status] vmware-stsd

2020-09-03 09:28:33 16281: [INFO] [check_status] vmware-stsd not running (PID file /var/log/vmware/sso/tcserver.pid does not exist).

2020-09-03 09:28:33 16281: [INFO] [start] Performing pre-startup actions.

2020-09-03 09:28:33 16281: [INFO] [start] Pre-startup complete.

2020-09-03 09:30:36 16281: [INFO] END vmware-stsd start

2020-09-03 09:34:30 17617: [INFO] BEGIN vmware-stsd status ; caller pid=17613 (executable: /usr/bin/bash)

2020-09-03 09:34:30 17617: [INFO] [check_status] vmware-stsd

2020-09-03 09:34:30 17617: [INFO] [checkpid] checking process 16308

2020-09-03 09:34:30 17617: [INFO] [checkpid] sending -0 signal returned success will also checkproc...

2020-09-03 09:34:30 17617: [INFO] [checkpid] checkproc returned success...

2020-09-03 09:34:30 17617: [INFO] Checking if 16308 is the STS process

2020-09-03 09:34:30 17617: [INFO] [checkpid] returns 0

2020-09-03 09:34:30 17617: [INFO] [check_status] vmware-stsd is running

2020-09-03 09:34:30 17617: [INFO] END vmware-stsd status

vmware-stsd.err

...

Exception in thread "commons-pool-EvictionTimer" java.lang.NoClassDefFoundError: org/apache/commons/pool2/impl/EvictionConfig

    at org.apache.commons.pool2.impl.GenericKeyedObjectPool.evict(GenericKeyedObjectPool.java:886)

    at org.apache.commons.pool2.impl.BaseGenericObjectPool$Evictor.run(BaseGenericObjectPool.java:1036)

    at java.util.TimerThread.mainLoop(Timer.java:555)

    at java.util.TimerThread.run(Timer.java:505)

Caused by: java.lang.ClassNotFoundException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.commons.pool2.impl.EvictionConfig]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.

    at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1368)

    at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1218)

    at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1180)

    ... 4 more

Caused by: java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.commons.pool2.impl.EvictionConfig]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.

    at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1378)

    at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1366)

    ... 6 more

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=160m; support was removed in 8.0

Sep 03, 2020 9:28:34 AM org.apache.catalina.startup.Catalina load

INFO: Initialization processed in 680 ms

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/lib/vmware-sso/vmware-sts/webapps/ROOT/WEB-INF/lib/slf4j-log4j12-1.7.26.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/lib/vmware-sso/vmware-sts/webapps/ROOT/WEB-INF/lib/log4j-slf4j-impl-2.11.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Sep 03, 2020 9:28:40 AM org.apache.catalina.startup.Catalina start

INFO: Server startup in 6724 ms

Exception in thread "commons-pool-EvictionTimer" java.lang.NoClassDefFoundError: org/apache/commons/pool2/impl/EvictionConfig

    at org.apache.commons.pool2.impl.GenericKeyedObjectPool.evict(GenericKeyedObjectPool.java:886)

    at org.apache.commons.pool2.impl.BaseGenericObjectPool$Evictor.run(BaseGenericObjectPool.java:1036)

    at java.util.TimerThread.mainLoop(Timer.java:555)

    at java.util.TimerThread.run(Timer.java:505)

Caused by: java.lang.ClassNotFoundException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.commons.pool2.impl.EvictionConfig]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.

    at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1368)

    at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1218)

    at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1180)

    ... 4 more

Caused by: java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.commons.pool2.impl.EvictionConfig]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.

    at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1378)

    at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1366)

    ... 6 more

0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

I tried to find some issues on that logs but i could not find anything. I suggest here to open a support ticket or wait for another user in the community. Use all these evidence to not waste more time.

Sorry that i could not help you buddy. Hope you find the solution and if you find it share it here so maybe you can solve other's issue.

0 Kudos
David_at_vmware
Contributor
Contributor
Jump to solution

Thanks for your help.

Because I have lost all hope in getting the restore to work, I gave the "broken" vcsa another go and actually got it to work.

we will resort to vcsa snapshots instead of backups until we have setup a lab environment to verify that it is actually useful.

I will mark your last reply as answer even though I didn't get it to work.