We are currently experiencing an issue with our vra 8.1 appliance. We had a network issue over the weekend that caused the vsan node that our appliance was running on to lose connectivity which seems to have caused a file system corruption on the appliance. We were able to run fsck on the appliance and get the appliance back on the network by following this kb article https://kb.vmware.com/s/article/2149838.
However, we are still unable to get the web service interface to respond when we try to access it using a web browser. Unfortunately, we are also having difficulty finding any documentation/kb articles etc. that might be helpful in our efforts to restore our services. (and no we don’t have a backup of the appliance because it is a poc environment..) ☹
I'm not sure exactly what I should be looking for/at.
I am getting these results from various kube commands.
root@vraa [ ~ ]# kubectl -n prelude get pods
The connection to the server localhost:8080 was refused - did you specify the right host or port?
root@vraa [ ~ ]#
root@vraa [ ~ ]# ps -ef |grep kube
root 9608 1 2 13:46 ? 00:01:53 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubel et.conf --config=/var/lib/kubelet/config.yaml --allowed-unsafe-sysctls=net.* --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=vmware/pau se:3.1 --resolv-conf=/run/systemd/resolve/resolv.conf --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE _ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --allowed-unsafe -sysctls=net.ipv4.tcp_keepalive_time,net.ipv4.tcp_keepalive_intvl,net.ipv4.tcp_keepalive_probes
root 10007 9956 0 13:46 ? 00:00:03 kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager .conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-cidr=10.2 44.0.0/22 --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokenclean er --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --node-cidr-mask-size=24 --node-monitor-grace-period=20s --node-monitor-period=5s --p od-eviction-timeout=30s --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private -key-file=/etc/kubernetes/pki/sa.key --terminated-pod-gc-threshold=5 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_S HA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 - -use-service-account-credentials=true
root 10015 9957 0 13:46 ? 00:00:42 kube-scheduler --bind-address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=true
root 28893 28682 0 15:14 pts/0 00:00:00 grep --color=auto kube
root@vraa [ ~ ]# docker ps | grep kube-apiserver
54624cbca2e3 vmware/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_kube-apiserver-vraa.mathwork s.com_kube-system_28839991b8fc3b74a6425edef23a542e_0
root@vraa [ ~ ]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4029351efb47 b102b1c1814a "etcd --advertise-cl…" 4 minutes ago Exited (1) 4 minutes ago k8s_etcd_etcd-vraa.mathworks.com_kube-system_20595ace0b8d3581907f0866be350161_23
6913147d1efc 30e0ba87cac4 "kube-apiserver --ad…" 4 minutes ago Exited (1) 4 minutes ago k8s_kube-apiserver_kube-apiserver-vraa.mathworks.com_kube-system_28839991b8fc3b74a6425edef23a542e_23
d97bb9f54541 dfd6cf8c8a40 "kube-controller-man…" 2 hours ago Up 2 hours k8s_kube-controller-manager_kube-controller-manager-vraa.mathworks.com_kube-system_f1fcf01628f38c636bd150fddffff503_0
eb64fdbde524 1c8152ca81c7 "kube-scheduler --bi…" 2 hours ago Up 2 hours k8s_kube-scheduler_kube-scheduler-vraa.mathworks.com_kube-system_7c41b41a1094ea7470703cdc2c891adc_0
df4607cd2814 vmware/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_kube-scheduler-vraa.mathworks.com_kube-system_7c41b41a1094ea7470703cdc2c891adc_0
b29c38f2663f vmware/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_kube-controller-manager-vraa.mathworks.com_kube-system_f1fcf01628f38c636bd150fddffff503_0
54624cbca2e3 vmware/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_kube-apiserver-vraa.mathworks.com_kube-system_28839991b8fc3b74a6425edef23a542e_0
f77f7f82cef5 vmware/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_etcd-vraa.mathworks.com_kube-system_20595ace0b8d3581907f0866be350161_0
root@vraa [ ~ ]#
Any thoughts and advice would be much appreciated.
Thanks
Dan
Hey!
I would advise to do a clean reboot of the vRA platform (Take a Snapshot first). Follow this procedure: Starting and stopping vRealize Automation
Thanks for the reply Lalegre. Unfortunately that process assumes that the kube instances are running which is not the case.
I've got a support case open with vmware and so far they have not had any luck getting things restarted either. We are still seeing "connection refused" types of messages. Depending on the command that is being executed.
Are some containers starting up at least? Which ones are failing?
This may not be relevant, but are you using multi-tenancy? If yes, this could prevent the services to start, there seems to be a bug in the current version of vIDM. My appliance did not crash, but after a simple reboot the services would not come online. The issue was a flag that has to be set in the vIDM database for multi-tenancy to work correctly.
How was this resolved ?