VMware Cloud Community
FredGSanford
Enthusiast
Enthusiast
Jump to solution

Chronic issues with LCM powering on a vIDM 3.3.7 linux cluster

Any folks having issues where opensearch will not start on a couple or all nodes while performing a power on of the globalenvironment?  If you try this 5 times, 2 times it will work, 3 times it won't.  LCM spins and spins at "Vmware identity manager restart elasticsearch/opensearch service."  LCM is 8.16 pspak 1, but previous versions of lcm and vidm make no difference.  Following this kb hasn't worked.  SR has been opened with VMware, but not getting much traction.  Is anyone else running vIDM clusters?  Thanks.

 

https://kb.vmware.com/s/article/74709?lang=en_US&queryTerm=VMware%20Identity%20Manager

 

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
FredGSanford
Enthusiast
Enthusiast
Jump to solution

Workaround discovered, and is supported by VMware.  Tested with vIDM 3.3.7 with latest patches and LCM 8.16.0.4 with latest pspak.  This brings the cluster online within 15 minutes as opposed to anywhere from 45 to 90 minutes (or timing out when trying to start elasticsearch/opensearch) when powering on from LCM.  

Run this on any node to make sure all 3 opensearch nodes are reporting: curl 'http://localhost:9200/_cat/nodes?v&s=cpu'

Find pgpool password in /usr/local/etc/pgpool.pwd – place in password location below

Run this on any node to make sure 3 postgres nodes are up and to find the postgres master:  su root -c "echo -e 'password'|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \"show pool_nodes\""

Power off vm’s from LCM

Power on postgres primary node from vcenter

SSH to postgres primary and verify horizon-workspace service has started

Run (use netmask of eth0): ifconfig eth0:0 inet delegateIP netmask x.x.x.x

Run: /etc/init.d/horizon-workspace restart

Wait about 1 minute for this page to load on the primary https://host/SAAS/auth/login

Power on 2 other nodes in vcenter

Wait a couple of minutes for the SAAS login page to load on the other 2

Log into the vIDM VIP - Cluster health is green  after a couple minutes

Run inventory sync in  LCM

Enable cluster auto-recovery in LCM

View solution in original post

Tags (1)
0 Kudos
1 Reply
FredGSanford
Enthusiast
Enthusiast
Jump to solution

Workaround discovered, and is supported by VMware.  Tested with vIDM 3.3.7 with latest patches and LCM 8.16.0.4 with latest pspak.  This brings the cluster online within 15 minutes as opposed to anywhere from 45 to 90 minutes (or timing out when trying to start elasticsearch/opensearch) when powering on from LCM.  

Run this on any node to make sure all 3 opensearch nodes are reporting: curl 'http://localhost:9200/_cat/nodes?v&s=cpu'

Find pgpool password in /usr/local/etc/pgpool.pwd – place in password location below

Run this on any node to make sure 3 postgres nodes are up and to find the postgres master:  su root -c "echo -e 'password'|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \"show pool_nodes\""

Power off vm’s from LCM

Power on postgres primary node from vcenter

SSH to postgres primary and verify horizon-workspace service has started

Run (use netmask of eth0): ifconfig eth0:0 inet delegateIP netmask x.x.x.x

Run: /etc/init.d/horizon-workspace restart

Wait about 1 minute for this page to load on the primary https://host/SAAS/auth/login

Power on 2 other nodes in vcenter

Wait a couple of minutes for the SAAS login page to load on the other 2

Log into the vIDM VIP - Cluster health is green  after a couple minutes

Run inventory sync in  LCM

Enable cluster auto-recovery in LCM

Tags (1)
0 Kudos