I am getting the error:
ERROR http-8443-5| com.vmware.bdd.manager.JobManager: mark task as failed: Queue full
I suspect it is caused because a delete cluster task has hung. Does anyone know:
a. Why the queue would be full
b. How to kill a stalled task without restarting the service
could you upload the log files /opt/serengeti/logs/*.log on BDE server for us to debug ?
If you want to cancel a 'cluster create' or 'cluster resize', you need to restart the Tomcat Service, and the creation of VMs already started in vCenter will not be cancelled. You have to run 'cluster delete' to delete them.
If you want to cancel other cluster operation (e.g. 'cluster config' which is not related to manage the VM itself in vCenter), you can run 'sudo service thrift-service restart'. Then the cluster config command will fail, you can run other cluster operation to fix it.
That's useful info about the thrift service.
I'm running BDE 2.1.
As a bit of background I was completing an operation within Ambari which hung which apparently had the knock on affect of causing the delete cluster operation to fail. Interestingly restarting Tomcat did not resolve the issue as the task was picked up again after the restart (presumably some sort of journaling to allow state to be persistent across restarts?).
The fix was to down the Ambari service causing a connection timeout which caused a connectivity timeout causing the cluster removal to proceed.
Strange as I thought BDE would place a timeout against a request to the Ambari API.
I have attached the logs anyway.
Thanks for the input
I read the file 'serengeti.log'. You create cluster of distro HDP 1.x before creating cluster of distro HDP 2.x. If in that case you need reset ambari server before create cluster of distro HDP 2.x. There are some issues on delete cluster through Ambari REST API. It does not support create different versions of HDP on the same Ambari server. To reset Ambari server steps. Log in Ambari server using user root and then run command "ambari-server reset". If you delete cluster which create with Ambari server failed. You can reset Ambari server first, and then run delete cluster on BDE server again.