hi,
it looks like it is not possible to restore a VCSA through the built-in backup/restore functionality in a mixed SSO environment.
SSO domain consists of 6 * vcsa of which 2 are upgraded to 7.0.
one of the vcsa 6.7 crashed and the restore of it from the backup crashes late in the process when trying to join the existing SSO domain.
I tried to find some issues on that logs but i could not find anything. I suggest here to open a support ticket or wait for another user in the community. Use all these evidence to not waste more time.
Sorry that i could not help you buddy. Hope you find the solution and if you find it share it here so maybe you can solve other's issue.
Moderator: Thread moved to the area specific to vCenter Server.
Let me see if i got it. You upgraded two VCSA from 6.7 to 7. One of them failed and you want to restore it but it is failing when trying to join SSO domain.
Is this correct? Could you please provide us some screenshots of your errors?
we had 5 * vcsa 6.7. then last week, we deployed a new vcsa 7.0 and connected it to the SSO domain.
we upgraded 1 VCSA to version 7.0.
seemingly unrelated 1 of the vcsa 6.7 crashed the next day (showing error 503 after reboot). to not loose to much time we decided to restore a backup from a yesterday.
Mixed version in the same SSO is not supported and it should be only used as a transient configuration when you are doing upgrade. What you did was deploying a new VCSA 7.0 and joined it to the same SSO domain so that action is not supported.
This could be the issue you are facing and I recommend you delete that VCSA and remove it from the SSO domain if you did not do any further configuration. If that is not possible then could you please provide some logs about restore process?
hi,
thanks for the reply. we are in this transition phase, but as usual, stuff always breaks at the most inconvenient time.
the join of the vcsa 7.0 was done as part of a Dell VXRail deployment by a certified consultant. I will check with him as well but during the procedure neither he nor the Dell Support technician who assisted us because of some other issues explained concern about this constellation.
as the log package is 670mb of compressed size (and probably contains sensitive information), which files would be most helpful for you?
I was trying to look to which logs you should look but i did not get that information from the official documentation. However i know where the backup logs are located and maybe you can take a look there: /var/log/vmware/applmgmt.
If there are no restore logs there maybe you can look at /var/log/vmware and look for the last updated logs. Even if you finished the Stage 1 you can login using SSH or the VMRC to the VM to see the latest update.
i found a lot of these in applmgmt.log
2020-09-03T09:30:43.810 [15175]ERROR:vmware.appliance.update.update_functions:Can't read JSON file /etc/applmgmt/appliance/software_update_state.conf [Errno 2] No such file or directory: '/etc/applmgmt/appliance/software_update_state.conf'
2020-09-03T09:30:43.810 [15175]DEBUG:vmware.appliance.update.update_state:No update state file. Will create a new one
2020-09-03T09:30:43.810 [15175]ERROR:vmware.appliance.update.update_functions:Can't read JSON file /etc/applmgmt/appliance/software_update_state.conf [Errno 2] No such file or directory: '/etc/applmgmt/appliance/software_update_state.conf'
i have attached the sanitized logs out of that folder to this post
Hey,
I can see that the restore process was able to finish but it has these lines inside:
2020-09-03T09:15:14.263 [MainProcess:PID-6711] ERROR: Reconciliation job failed to start. Error: error code: 1
2020-09-03T09:15:14.264 [MainProcess:PID-6711] ERROR: RestoreManager encountered an exception: error code: 1
If you go and see the reconciliation.log you can see the next:
ERROR: Error executing start on service vmware-stsd
Could you login into the failed VM and look into /var/log/vmware and check for the logs of the vmware-stsd service?
there you go.
(found in /var/log/vmware/sso/utils )
vmware-stsd.log:
...
2020-09-03 09:28:33 16281: [INFO] BEGIN vmware-stsd start ; caller pid=1 (executable: /usr/lib/systemd/systemd)
2020-09-03 09:28:33 16281: [INFO] Starting service.
2020-09-03 09:28:33 16281: [INFO] [check_status] vmware-stsd
2020-09-03 09:28:33 16281: [INFO] [check_status] vmware-stsd not running (PID file /var/log/vmware/sso/tcserver.pid does not exist).
2020-09-03 09:28:33 16281: [INFO] [start] Performing pre-startup actions.
2020-09-03 09:28:33 16281: [INFO] [start] Pre-startup complete.
2020-09-03 09:30:36 16281: [INFO] END vmware-stsd start
2020-09-03 09:34:30 17617: [INFO] BEGIN vmware-stsd status ; caller pid=17613 (executable: /usr/bin/bash)
2020-09-03 09:34:30 17617: [INFO] [check_status] vmware-stsd
2020-09-03 09:34:30 17617: [INFO] [checkpid] checking process 16308
2020-09-03 09:34:30 17617: [INFO] [checkpid] sending -0 signal returned success will also checkproc...
2020-09-03 09:34:30 17617: [INFO] [checkpid] checkproc returned success...
2020-09-03 09:34:30 17617: [INFO] Checking if 16308 is the STS process
2020-09-03 09:34:30 17617: [INFO] [checkpid] returns 0
2020-09-03 09:34:30 17617: [INFO] [check_status] vmware-stsd is running
2020-09-03 09:34:30 17617: [INFO] END vmware-stsd status
vmware-stsd.err
...
Exception in thread "commons-pool-EvictionTimer" java.lang.NoClassDefFoundError: org/apache/commons/pool2/impl/EvictionConfig
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.evict(GenericKeyedObjectPool.java:886)
at org.apache.commons.pool2.impl.BaseGenericObjectPool$Evictor.run(BaseGenericObjectPool.java:1036)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Caused by: java.lang.ClassNotFoundException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.commons.pool2.impl.EvictionConfig]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1368)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1218)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1180)
... 4 more
Caused by: java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.commons.pool2.impl.EvictionConfig]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1378)
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1366)
... 6 more
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=160m; support was removed in 8.0
Sep 03, 2020 9:28:34 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 680 ms
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/vmware-sso/vmware-sts/webapps/ROOT/WEB-INF/lib/slf4j-log4j12-1.7.26.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/vmware-sso/vmware-sts/webapps/ROOT/WEB-INF/lib/log4j-slf4j-impl-2.11.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Sep 03, 2020 9:28:40 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 6724 ms
Exception in thread "commons-pool-EvictionTimer" java.lang.NoClassDefFoundError: org/apache/commons/pool2/impl/EvictionConfig
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.evict(GenericKeyedObjectPool.java:886)
at org.apache.commons.pool2.impl.BaseGenericObjectPool$Evictor.run(BaseGenericObjectPool.java:1036)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Caused by: java.lang.ClassNotFoundException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.commons.pool2.impl.EvictionConfig]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1368)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1218)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1180)
... 4 more
Caused by: java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [org.apache.commons.pool2.impl.EvictionConfig]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1378)
at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1366)
... 6 more
I tried to find some issues on that logs but i could not find anything. I suggest here to open a support ticket or wait for another user in the community. Use all these evidence to not waste more time.
Sorry that i could not help you buddy. Hope you find the solution and if you find it share it here so maybe you can solve other's issue.
Thanks for your help.
Because I have lost all hope in getting the restore to work, I gave the "broken" vcsa another go and actually got it to work.
we will resort to vcsa snapshots instead of backups until we have setup a lab environment to verify that it is actually useful.
I will mark your last reply as answer even though I didn't get it to work.