After setting up a completely new cluster with ample resources (6x HP proliant/ AMD Ryzen), vSan and vSphere 7.0 (with ESXi 7.0 nodes) we tried to run the 'simple installer' to deploy vRealize Automation 8.1, but it keeps failing in stage 5, step 'VraVAInitialize':
After some digging, we found a few logging details that can help to understand what is causing this problem:
On the VRA machine, the rabbitmq deployments can't start: (CrashLoopBackOff)
detail of the failing pod log:
Steps we already tried to find the root cause of this problem:
- retried with slightly different installer parameters (simpler passwords, ...)
- check DNS settings (forward and reverse zones)
- apply KB76870 (VMware Knowledge Base)
- deploy to an older cluster (6.5) with Intel CPU's: exactly the same problem
Any suggestions would be much appreciated...
Hi,
Can you resolve vIDM server?
Do you have errors logging into vIDM?
Regards,
Leandro.
Hi,
thank you for the quick response.
Resolving vIDM from vRA and from lcm works.
logging in (ssh, webinterface) on vIDM is not a problem either, error logs (/var/log/messages) do not provide errors we can link to this problem..
Try to re-initialize the services and if you already have some data and configuration you can go with the below steps.
First You need to clean all the services. from the VRA appliance connect SSH.
after login enter "/opt/scripts/deploy.sh --onlyClean"
This will clear all the services one by one and will create offline DB backup file. the data will not be lost but the services will not be available.
Second You can redeploy the services using the same script deploy.sh
"/opt/scripts/deploy.sh"
This step take some time. After this step finished you can check all the service up and running using the below
"kubecli get pods -n prelude"
I have the exact same problem! Any ideas?
Hey roelvs,
What does the identity service log says inside the pods directory "/var/log/pods". Inside that pod run an "ls | grep identity" to identify the folders for the Identity Service.
Also you can check on "/service-logs/identity" to see if you identify some issues
My suggestion would be to open a ticket with vmware support, and they'll take it from there
I'm afraid contacting vmware support is not an option for us: we receive our licenses through the VMWARE academic channel and we are not entitled for support..
The identity logs contain a lot of errors, all similar to this one:
-25 09:17:46.770+0000 INFO WRITE_SQL 44 --- [ main] liquibase.executor.jvm.JdbcExecutor : ALTER TABLE public.identity_service_definition DROP COLUMN organization_id
2020-08-25 09:17:46.771+0000 INFO LOG 44 --- [ main] liquibase.changelog.ChangeSet : Column identity_service_definition.organization_id dropped
2020-08-25 09:17:46.772+0000 INFO LOG 44 --- [ main] liquibase.changelog.ChangeSet : ChangeSet db/changelog/18-remove-service-org-relation.xml::1582281611441-7::zshindova (generated) ran successfully in 2ms
2020-08-25 09:17:46.772+0000 INFO WRITE_SQL 44 --- [ main] liquibase.executor.jvm.JdbcExecutor : INSERT INTO public.databasechangelog (ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, CONTEXTS, LABELS, LIQUIBASE, DEPLOYMENT_ID) VALUES ('1582281611441-7', 'zshindova (generated)', 'db/changelog/18-remove-service-org-relation.xml', NOW(), 54, '8:927af9187f5edec6dc0ec37339d9edb1', 'dropColumn columnName=organization_id, tableName=identity_service_definition', '', 'EXECUTED', NULL, NULL, '3.8.1', '8347066340')
2020-08-25 09:17:46.776+0000 INFO LOG 44 --- [ main] l.lockservice.StandardLockService : Successfully released change log lock
2020-08-25 09:17:46.899+0000 INFO 44 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]
2020-08-25 09:17:46.998+0000 INFO 44 --- [ main] org.hibernate.Version : HHH000412: Hibernate Core {5.4.10.Final}
2020-08-25 09:17:47.139+0000 INFO 44 --- [ main] o.hibernate.annotations.common.Version : HCANN000001: Hibernate Commons Annotations {5.1.0.Final}
2020-08-25 09:17:47.269+0000 INFO 44 --- [ main] org.hibernate.dialect.Dialect : HHH000400: Using dialect: org.hibernate.dialect.PostgreSQL94Dialect
2020-08-25 09:17:48.565+0000 INFO 44 --- [ main] o.h.e.t.j.p.i.JtaPlatformInitiator : HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]
2020-08-25 09:17:48.572+0000 INFO 44 --- [ main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'
2020-08-25 09:17:49.081+0000 INFO 44 --- [ main] c.v.i.s.i.PropertyConfigurationService : Loaded classpath:config_service.properties
2020-08-25 09:17:51.119+0000 INFO 44 --- [ main] c.v.i.service.impl.CryptoServiceImpl : Keystore persisted
2020-08-25 09:17:51.119+0000 INFO 44 --- [ main] c.v.i.service.impl.CryptoServiceImpl : Loaded keystore
2020-08-25 09:17:51.785+0000 INFO 44 --- [ main] com.vmware.identity.rest.RestClient : POST https://idm.somedomain.be/SAAS/API/1.0/oauth2/token?grant_type=client_credentials
2020-08-25 09:17:52.035+0000 INFO 44 --- [ main] com.vmware.identity.rest.RestClient : POST https://idm.somedomain.be/SAAS/API/1.0/oauth2/token?grant_type=client_credentials
2020-08-25 09:17:52.094+0000 WARN 44 --- [or-http-epoll-1] i.n.c.AbstractChannelHandlerContext : An exception 'reactor.core.Exceptions$ErrorCallbackNotImplemented: javax.net.ssl.SSLHandshakeException: General SSLEngine problem' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1537) ~[na:1.8.0_241]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Request to POST https://idm.somedomain.be/SAAS/API/1.0/oauth2/token?grant_type=client_credentials [DefaultWebClient]
Stack trace:
at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1537) ~[na:1.8.0_241]
at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:528) ~[na:1.8.0_241]
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:802) ~[na:1.8.0_241]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766) ~[na:1.8.0_241]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.8.0_241]
at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:281) ~[netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1324) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241]
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[na:1.8.0_241]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1709) ~[na:1.8.0_241]
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:318) ~[na:1.8.0_241]
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:310) ~[na:1.8.0_241]
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1640) ~[na:1.8.0_241]
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:223) ~[na:1.8.0_241]
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) ~[na:1.8.0_241]
at sun.security.ssl.Handshaker$1.run(Handshaker.java:970) ~[na:1.8.0_241]
at sun.security.ssl.Handshaker$1.run(Handshaker.java:967) ~[na:1.8.0_241]
at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_241]
at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1475) ~[na:1.8.0_241]
at io.netty.handler.ssl.SslHandler.runAllDelegatedTasks(SslHandler.java:1494) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1508) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1392) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241]
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:450) ~[na:1.8.0_241]
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:259) ~[na:1.8.0_241]
at sun.security.validator.Validator.validate(Validator.java:262) ~[na:1.8.0_241]
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:330) ~[na:1.8.0_241]
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:279) ~[na:1.8.0_241]
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144) ~[na:1.8.0_241]
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1627) ~[na:1.8.0_241]
... 30 common frames omitted
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) ~[na:1.8.0_241]
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) ~[na:1.8.0_241]
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) ~[na:1.8.0_241]
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:445) ~[na:1.8.0_241]
... 36 common frames omitted
2020-08-25 09:17:52.103+0000 ERROR 44 --- [or-http-epoll-2] .v.i.c.AbstractOrganizationConfiguration : User 'configadmin' was not found.
Quick question, regarding the DNS. You mention that the A records are all okay, regarding the PTR do you have any duplicated one?
We finally found out what was wrong...
in the internal DNS zone we use, there is also a * record for another project. The other DNS entries were just fine (A-records, PTR-records.)
It turns out that the installer fails on this scenario..
Our solution was to use another (sub)domain, without a * record...
Thank you all for the suggestions!