Contributor
Contributor

VRA 8.1 Web interface not loading after networking issue

We are currently experiencing an issue with our vra 8.1 appliance.  We had a network issue over the weekend that caused the vsan node that our appliance was running on to lose connectivity which seems to have caused a file system corruption on the appliance.  We were able to run fsck on the appliance and get the appliance back on the network by following this kb article https://kb.vmware.com/s/article/2149838.

However, we are still unable to get the web service interface to respond when we try to access it using a web browser.  Unfortunately, we are also having difficulty finding any documentation/kb articles etc. that might be helpful in our efforts to restore our services.  (and no we don’t have a backup of the appliance because it is a poc environment..)

I'm not sure exactly what I should be looking for/at. 

I am getting these results from various kube commands.

root@vraa [ ~ ]# kubectl -n prelude get pods

The connection to the server localhost:8080 was refused - did you specify the right host or port?

root@vraa [ ~ ]#

root@vraa [ ~ ]# ps -ef |grep kube

root      9608     1  2 13:46 ?        00:01:53 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubel                   et.conf --config=/var/lib/kubelet/config.yaml --allowed-unsafe-sysctls=net.* --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=vmware/pau                   se:3.1 --resolv-conf=/run/systemd/resolve/resolv.conf --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE                   _ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --allowed-unsafe                   -sysctls=net.ipv4.tcp_keepalive_time,net.ipv4.tcp_keepalive_intvl,net.ipv4.tcp_keepalive_probes

root     10007  9956  0 13:46 ?        00:00:03 kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager                   .conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address=127.0.0.1 --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-cidr=10.2                   44.0.0/22 --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokenclean                   er --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --node-cidr-mask-size=24 --node-monitor-grace-period=20s --node-monitor-period=5s --p                   od-eviction-timeout=30s --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private                   -key-file=/etc/kubernetes/pki/sa.key --terminated-pod-gc-threshold=5 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_S                   HA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 -                   -use-service-account-credentials=true

root     10015  9957  0 13:46 ?        00:00:42 kube-scheduler --bind-address=127.0.0.1 --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=true

root     28893 28682  0 15:14 pts/0    00:00:00 grep --color=auto kube

root@vraa [ ~ ]# docker ps | grep kube-apiserver

54624cbca2e3        vmware/pause:3.1    "/pause"                 2 hours ago         Up 2 hours                              k8s_POD_kube-apiserver-vraa.mathwork                   s.com_kube-system_28839991b8fc3b74a6425edef23a542e_0

root@vraa [ ~ ]# docker ps -a

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                     PORTS               NAMES

4029351efb47        b102b1c1814a        "etcd --advertise-cl…"   4 minutes ago       Exited (1) 4 minutes ago                       k8s_etcd_etcd-vraa.mathworks.com_kube-system_20595ace0b8d3581907f0866be350161_23

6913147d1efc        30e0ba87cac4        "kube-apiserver --ad…"   4 minutes ago       Exited (1) 4 minutes ago                       k8s_kube-apiserver_kube-apiserver-vraa.mathworks.com_kube-system_28839991b8fc3b74a6425edef23a542e_23

d97bb9f54541        dfd6cf8c8a40        "kube-controller-man…"   2 hours ago         Up 2 hours                                     k8s_kube-controller-manager_kube-controller-manager-vraa.mathworks.com_kube-system_f1fcf01628f38c636bd150fddffff503_0

eb64fdbde524        1c8152ca81c7        "kube-scheduler --bi…"   2 hours ago         Up 2 hours                                     k8s_kube-scheduler_kube-scheduler-vraa.mathworks.com_kube-system_7c41b41a1094ea7470703cdc2c891adc_0

df4607cd2814        vmware/pause:3.1    "/pause"                 2 hours ago         Up 2 hours                                     k8s_POD_kube-scheduler-vraa.mathworks.com_kube-system_7c41b41a1094ea7470703cdc2c891adc_0

b29c38f2663f        vmware/pause:3.1    "/pause"                 2 hours ago         Up 2 hours                                     k8s_POD_kube-controller-manager-vraa.mathworks.com_kube-system_f1fcf01628f38c636bd150fddffff503_0

54624cbca2e3        vmware/pause:3.1    "/pause"                 2 hours ago         Up 2 hours                                     k8s_POD_kube-apiserver-vraa.mathworks.com_kube-system_28839991b8fc3b74a6425edef23a542e_0

f77f7f82cef5        vmware/pause:3.1    "/pause"                 2 hours ago         Up 2 hours                                     k8s_POD_etcd-vraa.mathworks.com_kube-system_20595ace0b8d3581907f0866be350161_0

root@vraa [ ~ ]#

Any thoughts and advice would be much appreciated.

Thanks

Dan

Dan Scotti
0 Kudos
3 Replies
Commander
Commander

Hey!

I would advise to do a clean reboot of the vRA platform (Take a Snapshot first). Follow this procedure: Starting and stopping vRealize Automation

0 Kudos
Contributor
Contributor

Thanks for the reply Lalegre. Unfortunately that process assumes that the kube instances are running which is not the case.

I've got a support case open with vmware and so far they have not had any luck getting things restarted either. We are still seeing "connection refused" types of messages. Depending on the command that is being executed.

Dan Scotti
0 Kudos
Expert
Expert

Are some containers starting up at least? Which ones are failing?

This may not be relevant, but are you using multi-tenancy? If yes, this could prevent the services to start, there seems to be a bug in the current version of vIDM. My appliance did not crash, but after a simple reboot the services would not come online. The issue was a flag that has to be set in the vIDM database for multi-tenancy to work correctly.

0 Kudos