roelvs
Enthusiast
Enthusiast

vrealize 8.1 'simple install' fails at VraVaInitialize

After setting up a completely new cluster with ample resources (6x HP proliant/ AMD Ryzen), vSan and vSphere 7.0 (with ESXi 7.0 nodes) we tried to run the 'simple installer' to deploy vRealize Automation 8.1, but it keeps failing in stage 5, step 'VraVAInitialize':

After some digging, we found a few logging details that can help to understand what is causing this problem:

pastedImage_1.jpg

On the VRA machine, the rabbitmq deployments can't start: (CrashLoopBackOff)

pastedImage_2.jpg

detail of the failing pod log:

MicrosoftTeams-image.png

Steps we already tried to find the root cause of this problem:

- retried with slightly different installer parameters (simpler passwords, ...)

- check DNS settings (forward and reverse zones)

- apply KB76870 (VMware Knowledge Base)

- deploy to an older cluster (6.5) with Intel CPU's: exactly the same problem

Any suggestions would be much appreciated...

10 Replies
lnairn
VMware Employee
VMware Employee

Hi,

Can you resolve vIDM server?

Do you have errors logging into vIDM?

Regards,

Leandro.

0 Kudos
roelvs
Enthusiast
Enthusiast

Hi,

thank you for the quick response.

Resolving vIDM from vRA and from lcm works.

logging in (ssh, webinterface) on vIDM is not a problem either, error logs (/var/log/messages) do not provide errors we can link to this problem..

0 Kudos
MostafaElSayedF
Enthusiast
Enthusiast

Try to re-initialize the services and if you already have some data and configuration you can go with the below steps.

First You need to clean all the services. from the VRA appliance connect SSH.

after login enter  "/opt/scripts/deploy.sh --onlyClean"

This will clear all the services one by one and will create offline DB backup file. the data will not be lost but the services will not be available.

Second You can redeploy the services using the same script deploy.sh

"/opt/scripts/deploy.sh"

This step take some time. After this step finished you can check all the service up  and running using the below

"kubecli get pods -n prelude"

I hope this answer your question and i hope that this become answer or helpful comment for you. also, for More details and more information just follow my blog https://www.syncgates.com Mostafa Fahmy
0 Kudos
roelvs
Enthusiast
Enthusiast

Mostafa,

thank you for the useful tip! ... but the problem remains..

I attached the output log below (edited to hide hostnames, serial keys, certificates), hope it helps to understand the problem we are facing..

0 Kudos
sven_knockaert
Contributor
Contributor

I have the exact same problem! Any ideas?

0 Kudos
Lalegre
Virtuoso
Virtuoso

Hey roelvs​,

What does the identity service log says inside the pods directory "/var/log/pods". Inside that pod run an "ls | grep identity" to identify the folders for the Identity Service.

Also you can check on "/service-logs/identity" to see if you identify some issues

0 Kudos
gradinka
VMware Employee
VMware Employee

My suggestion would be to open a ticket with vmware support, and they'll take it from there

0 Kudos
roelvs
Enthusiast
Enthusiast

I'm afraid contacting vmware support is not an option for us: we receive our licenses through the VMWARE academic channel and we are not entitled for support..

The identity logs contain a lot of errors, all similar to this one:

-25 09:17:46.770+0000  INFO WRITE_SQL 44 --- [           main]      liquibase.executor.jvm.JdbcExecutor : ALTER TABLE public.identity_service_definition DROP COLUMN organization_id

2020-08-25 09:17:46.771+0000  INFO LOG 44 --- [           main]            liquibase.changelog.ChangeSet : Column identity_service_definition.organization_id dropped

2020-08-25 09:17:46.772+0000  INFO LOG 44 --- [           main]            liquibase.changelog.ChangeSet : ChangeSet db/changelog/18-remove-service-org-relation.xml::1582281611441-7::zshindova (generated) ran successfully in 2ms

2020-08-25 09:17:46.772+0000  INFO WRITE_SQL 44 --- [           main]      liquibase.executor.jvm.JdbcExecutor : INSERT INTO public.databasechangelog (ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, CONTEXTS, LABELS, LIQUIBASE, DEPLOYMENT_ID) VALUES ('1582281611441-7', 'zshindova (generated)', 'db/changelog/18-remove-service-org-relation.xml', NOW(), 54, '8:927af9187f5edec6dc0ec37339d9edb1', 'dropColumn columnName=organization_id, tableName=identity_service_definition', '', 'EXECUTED', NULL, NULL, '3.8.1', '8347066340')

2020-08-25 09:17:46.776+0000  INFO LOG 44 --- [           main]        l.lockservice.StandardLockService : Successfully released change log lock

2020-08-25 09:17:46.899+0000  INFO  44 --- [           main]  o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]

2020-08-25 09:17:46.998+0000  INFO  44 --- [           main]                    org.hibernate.Version : HHH000412: Hibernate Core {5.4.10.Final}

2020-08-25 09:17:47.139+0000  INFO  44 --- [           main]   o.hibernate.annotations.common.Version : HCANN000001: Hibernate Commons Annotations {5.1.0.Final}

2020-08-25 09:17:47.269+0000  INFO  44 --- [           main]            org.hibernate.dialect.Dialect : HHH000400: Using dialect: org.hibernate.dialect.PostgreSQL94Dialect

2020-08-25 09:17:48.565+0000  INFO  44 --- [           main]       o.h.e.t.j.p.i.JtaPlatformInitiator : HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]

2020-08-25 09:17:48.572+0000  INFO  44 --- [           main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'

2020-08-25 09:17:49.081+0000  INFO  44 --- [           main]   c.v.i.s.i.PropertyConfigurationService : Loaded classpath:config_service.properties

2020-08-25 09:17:51.119+0000  INFO  44 --- [           main]     c.v.i.service.impl.CryptoServiceImpl : Keystore persisted

2020-08-25 09:17:51.119+0000  INFO  44 --- [           main]     c.v.i.service.impl.CryptoServiceImpl : Loaded keystore

2020-08-25 09:17:51.785+0000  INFO  44 --- [           main]      com.vmware.identity.rest.RestClient : POST https://idm.somedomain.be/SAAS/API/1.0/oauth2/token?grant_type=client_credentials

2020-08-25 09:17:52.035+0000  INFO  44 --- [           main]      com.vmware.identity.rest.RestClient : POST https://idm.somedomain.be/SAAS/API/1.0/oauth2/token?grant_type=client_credentials

2020-08-25 09:17:52.094+0000  WARN  44 --- [or-http-epoll-1]      i.n.c.AbstractChannelHandlerContext : An exception 'reactor.core.Exceptions$ErrorCallbackNotImplemented: javax.net.ssl.SSLHandshakeException: General SSLEngine problem' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:

javax.net.ssl.SSLHandshakeException: General SSLEngine problem

        at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1537) ~[na:1.8.0_241]

        Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:

Error has been observed at the following site(s):

        |_ checkpoint ⇢ Request to POST https://idm.somedomain.be/SAAS/API/1.0/oauth2/token?grant_type=client_credentials [DefaultWebClient]

Stack trace:

                at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1537) ~[na:1.8.0_241]

                at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:528) ~[na:1.8.0_241]

                at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:802) ~[na:1.8.0_241]

                at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766) ~[na:1.8.0_241]

                at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.8.0_241]

                at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:281) ~[netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1324) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]

                at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]

                at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]

                at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

                at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

                at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241]

Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem

        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[na:1.8.0_241]

        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1709) ~[na:1.8.0_241]

        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:318) ~[na:1.8.0_241]

        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:310) ~[na:1.8.0_241]

        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1640) ~[na:1.8.0_241]

        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:223) ~[na:1.8.0_241]

        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037) ~[na:1.8.0_241]

        at sun.security.ssl.Handshaker$1.run(Handshaker.java:970) ~[na:1.8.0_241]

        at sun.security.ssl.Handshaker$1.run(Handshaker.java:967) ~[na:1.8.0_241]

        at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_241]

        at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1475) ~[na:1.8.0_241]

        at io.netty.handler.ssl.SslHandler.runAllDelegatedTasks(SslHandler.java:1494) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1508) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1392) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) [netty-handler-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]

        at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]

        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-native-epoll-4.1.45.Final-linux-x86_64.jar!/:4.1.45.Final]

        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.45.Final.jar!/:4.1.45.Final]

        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_241]

Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:450) ~[na:1.8.0_241]

        at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:259) ~[na:1.8.0_241]

        at sun.security.validator.Validator.validate(Validator.java:262) ~[na:1.8.0_241]

        at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:330) ~[na:1.8.0_241]

        at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:279) ~[na:1.8.0_241]

        at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:144) ~[na:1.8.0_241]

        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1627) ~[na:1.8.0_241]

        ... 30 common frames omitted

Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

        at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) ~[na:1.8.0_241]

        at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) ~[na:1.8.0_241]

        at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) ~[na:1.8.0_241]

        at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:445) ~[na:1.8.0_241]

        ... 36 common frames omitted

2020-08-25 09:17:52.103+0000 ERROR  44 --- [or-http-epoll-2] .v.i.c.AbstractOrganizationConfiguration : User 'configadmin' was not found.

0 Kudos
Lalegre
Virtuoso
Virtuoso

Quick question, regarding the DNS. You mention that the A records are all okay, regarding the PTR do you have any duplicated one?

0 Kudos
roelvs
Enthusiast
Enthusiast

We finally found out what was wrong...

in the internal DNS zone we use, there is also a * record for another project. The other DNS entries were just fine (A-records, PTR-records.)

It turns out that the installer fails on this scenario..

Our solution was to use another (sub)domain, without a * record...

Thank you all for the suggestions!