dalo
Hot Shot
Hot Shot

cannot stop deployment

Jump to solution

I tried to stop our VIO deployment in the vCenter, because we had to do hardware maintainance. But the stop task takes forever, so I decided to shut down the VMs manually.

After that I had to start and restart the deployment with the command:

sudo viocli services stop

sudo viocli services start

The deployment works now, but in the vCenter the status is still on "error"

If I check on the console the deployment seems ok:

# viocli deployment status

Collector Name                      Overall Status

----------------------------------  ----------------

VerifyTimeSynchronization           SUCCESS

VerifyConnection                    SUCCESS

VerifyMariaDatabaseClusterSize      SUCCESS

VerifyDatabaseConnectionPerProcess  SUCCESS

VerifyRunningProcess                SUCCESS

but if I try to stop I get:

# viocli deployment stop

Deployment with name: VIO has a task in progress, waiting for it to finish...

and it doesn't seem that waiting helps. How can I find this running task and fix this?

0 Kudos
1 Solution

Accepted Solutions
VirtualFox
VMware Employee
VMware Employee

Run these two commands to get the status to reset.

viocli show

# Get the VM name for controller02

viocli recover -n <VMname-Controller-1>

When a node is rebuilt and as part of the process the cluster status is reset. Since controllers are stateless, they are easier to recover since you do not need to have a backup to import like a database node needs.

View solution in original post

0 Kudos
9 Replies
xgao3
VMware Employee
VMware Employee

you can look into /var/log/jarvis directory.  Find out which file is still updating and take a look at the last 100 or so lines.

0 Kudos
dalo
Hot Shot
Hot Shot

thank you for your answer.

The files in /var/log/jarvis doesn't seem to update anymore:

-rw-r--r-- 1 jarvis adm 3975869 May 10 14:50 ansible.log

-rw-r--r-- 1 jarvis adm 5613701 May 15 10:23 jarvis.log

-rw-r--r-- 1 jarvis adm 2283852 May 10 14:50 pecan.log

The error in jarvis.log doesn't helps me:

2017-05-11 09:10:44,368 INFO  [jarvis.ans.util][MainThread] Using Customization file /opt/vmware/vio/custom/custom.yml

2017-05-11 09:10:44,368 INFO  [jarvis.ans.util][MainThread] No customization file params were specified.

2017-05-11 09:10:44,508 ERROR [wsme.api][MainThread] Server-side error: "need more than 1 value to unpack". Detail:

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/wsmeext/pecan.py", line 82, in callfunction

    result = f(self, *args, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/jarvis/api/controllers/v1.py", line 170, in status

    period=period)

  File "/usr/local/lib/python2.7/dist-packages/jarvis/ans/manager.py", line 967, in report_viomon_status

    period=period)

  File "/usr/lib/python2.7/dist-packages/viomon/util/log.py", line 190, in report_viomon_status

    collector_map[collector_name].processor.send(message)

  File "/usr/lib/python2.7/dist-packages/viomon/module/base.py", line 86, in process_message

    self._further_process_message(message)

  File "/usr/lib/python2.7/dist-packages/viomon/module/database.py", line 124, in _further_process_message

    process_name, count = message.value.split(':')

ValueError: need more than 1 value to unpack

2017-05-11 09:10:44,513 ERROR [pecan.commands.serve][MainThread] "GET /deployment/VIO/status?period=300 HTTP/1.1" 500 1029

2017-05-11 09:12:14,171 INFO  [jarvis.api.controllers.v1][MainThread] Retrieve status of deployment VIO with period: 300 second(s).

I also tried now a "viocli deployment configure" and this runs without a error.

This procedure also didn't update the logs in /var/log/jarvis.

Do you have other suggestions?

Thank you.

0 Kudos
xgao3
VMware Employee
VMware Employee

can you confirm if your postgres and OMS status is running on your setup:

service vpostgres status

service oms status

0 Kudos
dalo
Hot Shot
Hot Shot

yes, I checked this on the management server:

# service vpostgres status

vpostgres start/running, process 1870

# service oms status

oms start/running, process 12844

0 Kudos
dalo
Hot Shot
Hot Shot

i'm still stuck into this.

Is it possible to see the running process that blocks my deployment?

~# viocli deployment stop

Deployment with name: VIO has a task in progress, waiting for it to finish...

Or do I've to install the deployment new and restore from a backup?

0 Kudos
dalo
Hot Shot
Hot Shot

I tried now to set the uncompleted task in the management DB to 'COMPLETED'. Now I got a different message:

root@ids-ost-1:/home/viouser# viocli deployment start

Deployment: VIO is not in STOPPED state.

Cannot start the deployment.

root@ids-ost-1:/home/viouser# viocli deployment stop

Deployment: VIO is not in RUNNING state.

Cannot stop the deployment.

root@ids-ost-1:/home/viouser# viocli deployment status

Collector Name                      Overall Status

----------------------------------  ----------------

VerifyTimeSynchronization           SUCCESS

VerifyConnection                    SUCCESS

VerifyMariaDatabaseClusterSize      SUCCESS

VerifyDatabaseConnectionPerProcess  SUCCESS

VerifyRunningProcess                SUCCESS

I didn't found the location where the state is saved. Can anybody tell me this, so I could this set to 'RUNNING'?:

0 Kudos
VirtualFox
VMware Employee
VMware Employee

Run these two commands to get the status to reset.

viocli show

# Get the VM name for controller02

viocli recover -n <VMname-Controller-1>

When a node is rebuilt and as part of the process the cluster status is reset. Since controllers are stateless, they are easier to recover since you do not need to have a backup to import like a database node needs.

0 Kudos
dalo
Hot Shot
Hot Shot

Thank you, that solved the issue.

0 Kudos
VirtualFox
VMware Employee
VMware Employee

You're welcome.

0 Kudos