VMware Cloud Community
Magicmanashu
Contributor
Contributor

vco-app-XXXX 8.1 pods - CrashLoopBackOff

on running deploy.sh i get below on vra 8.1

vco-app-66dc7fdc98-72q8s                       2/3     CrashLoopBackOff   3          8m3s

vco-app-66dc7fdc98-7nbjz                       3/3     Running            0          8m3s

vco-app-66dc7fdc98-qdb2p                       2/3     PostStartHookError: command 'sh -c sh /opt/preload-images.sh || true' exited with 137: + cd /opt/base-images

+ ls photon-3.0.tar.gz shared.tar.gz vco-polyglot-node-12.12.0.tar.gz vco-polyglot-powercli-11.5.0-powershell-6.2.3.tar.gz vco-polyglot-python-3.7.3.tar.gz

+ xargs -n 1 -I '{}' sh -c 'basename {} .tar.gz | xargs mkdir'

+ + lsxargs photon-3.0.tar.gz -n shared.tar.gz 1 vco-polyglot-node-12.12.0.tar.gz -I vco-polyglot-powercli-11.5.0-powershell-6.2.3.tar.gz '{}' vco-polyglot-python-3.7.3.tar.gz sh

-c 'basename {} .tar.gz | xargs tar zxvf {} -C'

Please help with the same.

5 Replies
maverix7
VMware Employee
VMware Employee

This has been fixed in 8.1 P1, please consider upgrading. As an immediate resolution you can ssh the particular node that this pod is crashing and execute

rm -rf /data/vco/var/run/vco-polyglot-runner-sock/docker*

rm -rf /data/vco/var/run/vco-polyglot-runner-sock/xtables.lock

upon the next pod start everything should be ok.

0 Kudos
Magicmanashu
Contributor
Contributor

Tried the steps mentioned by you (without upgrade) , getting the below error still:

vco-app-5f55f796-6drzw                         1/3    PostStartHookError: command 'sh -c sh /opt/preload-images.sh || true' exited with 137: + cd /opt/base-images

+ ls photon-3.0.tar.gz shared.tar.gz vco-polyglot-node-12.12.0.tar.gz vco-polyglot-powercli-11.5.0-powershell-6.2.3.tar.gz vco-polyglot-python-3.7.3.tar.gz

+ xargs -n 1 -I '{}' sh -c 'basename {} .tar.gz | xargs mkdir'

+ xargs -n 1+  -I '{}' sh -c 'basename {} .tar.gz | xargs tar zxvf {} -C'

ls photon-3.0.tar.gz shared.tar.gz vco-polyglot-node-12.12.0.tar.gz vco-polyglot-powercli-11.5.0-powershell-6.2.3.tar.gz vco-polyglot-python-3.7.3.tar.gz

+ rm photon-3.0.tar.gz shared.tar.gz vco-polyglot-node-12.12.0.tar.gz vco-polyglot-powercli-11.5.0-powershell-6.2.3.tar.gz vco-polyglot-python-3.7.3.tar.gz

+ i=0

+ '[' 0 -lt 30 ]

+ docker ps

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

+ echo 'Waiting for Docker to be ready'

+ sleep 2

+ expr 0 + 1

+ i=1

+ '[' 1 -lt 30 ]

+ docker ps

0 Kudos
maverix7
VMware Employee
VMware Employee

Can you double check that you did it on the correct nodes, based on what I see believe you should do it on 2 of the 3? After that you can delete the problematic vco pods so that the can be redeployed. If the issue continues to persist I would suggest opening an SR.

0 Kudos
Magicmanashu
Contributor
Contributor

Yes , i did it on the correct nodes, i deleted the pods also, but they are failing again and again, Not sure why this behaviour  is happening with 2 and 3 rd node.Primary node is always success.Will check with GSS

gradinka
VMware Employee
VMware Employee

have you opened the SR yet?

by chance are you using 172.x.x.x network for deployment ?

0 Kudos