stevehoward2020
Enthusiast
Enthusiast

Force delete cluster?

Jump to solution

Hi All,

We are still in the testing phase.  While deleting a cluster, we had an EMC storage issue that caused the cluster delete to fail.  Since then, it looks like we have some corrupt metadata, as the storage issue has been resolved, but the cluster delete fails.

See below...

<pre>

serengeti>cluster delete --name CustomerHub

STARTED 10%

node group: worker2,  instance number: 0

roles:[basic]

node group: worker3,  instance number: 0

roles:[basic]

node group: worker4,  instance number: 0

roles:[basic]

node group: worker5,  instance number: 0

roles:[basic]

node group: worker1,  instance number: 1

roles:[basic]

  NAME                   IP  STATUS       TASK

  --------------------------------------------

  CustomerHub-worker1-5      Powered Off

node group: worker6,  instance number: 1

roles:[basic]

  NAME                   IP  STATUS       TASK

  --------------------------------------------

  CustomerHub-worker6-0      Powered Off

node group: worker7,  instance number: 0

roles:[basic]

node group: master,  instance number: 0

roles:[basic]

node group: worker8,  instance number: 0

roles:[basic]

FAILED 60%

node group: worker2,  instance number: 0

roles:[basic]

node group: worker3,  instance number: 0

roles:[basic]

node group: worker4,  instance number: 0

roles:[basic]

node group: worker5,  instance number: 0

roles:[basic]

node group: worker1,  instance number: 1

roles:[basic]

  NAME                   IP  STATUS       TASK

  --------------------------------------------

  CustomerHub-worker1-5      Powered Off

node group: worker6,  instance number: 1

roles:[basic]

  NAME                   IP  STATUS       TASK

  --------------------------------------------

  CustomerHub-worker6-0      Powered Off

node group: worker7,  instance number: 0

roles:[basic]

node group: master,  instance number: 0

roles:[basic]

node group: worker8,  instance number: 0

roles:[basic]

The failed nodes: 2

  ----------------------------------------------------------------------------

[NAME] CustomerHub-worker1-5

[STATUS] Powered Off

[Error Message] [2014-12-15T13:50:50.886+0000] Failed to delete VM CustomerHub-worker1-5 for: null

  ----------------------------------------------------------------------------

[NAME] CustomerHub-worker6-0

[STATUS] Powered Off

[Error Message] [2014-12-15T13:50:50.889+0000] Failed to delete VM CustomerHub-worker6-0 for: null

  ----------------------------------------------------------------------------

cluster CustomerHub delete failed: Failed to delete virtual machine for cluster CustomerHub.

serengeti>

</pre>

Is there a way to force a cluster to be deleted?  I know we can delete the VM's but the metadata would still be in the VSphere database (or at least we think it would).

Thanks,

Steve

0 Kudos
1 Solution

Accepted Solutions
stevehoward2020
Enthusiast
Enthusiast

We ended up searching every column of every table in the VCenter database, and found the only references were in VPX_EVENT and VPX_TEXT_ARRAY.  These looked like historical transaction tables, with no metadata about the missing VM's.  As such, on a hunch we restarted the serengeti server assuming this was cache related.  When we did this, the cluster was still in the list, but the cluster delete command now worked.

View solution in original post

0 Kudos
4 Replies
stevehoward2020
Enthusiast
Enthusiast

Below is the stack in the log...

<code>

[2014-12-15T14:43:43.088+0000] ERROR SimpleAsyncTaskExecutor-12| com.vmware.bdd.service.impl.ClusteringService: Failed to delete VM CustomerHub-worker1-5

com.vmware.aurora.exception.VcException: Failed to delete VM CustomerHub-worker1-5 for: null

        at com.vmware.aurora.exception.VcException.DELETE_VM_FAILED(VcException.java:101)

        at com.vmware.aurora.vc.VcVirtualMachineImpl.destroy(VcVirtualMachine.java:2323)

        at com.vmware.aurora.vc.VcVirtualMachineImpl.destroy(VcVirtualMachine.java:2307)

        at com.vmware.bdd.service.sp.DeleteVmByIdSP$1.body(DeleteVmByIdSP.java:71)

        at com.vmware.bdd.service.sp.DeleteVmByIdSP$1.body(DeleteVmByIdSP.java:58)

        at com.vmware.aurora.vc.vcservice.VcContext.inVcSessionDo(VcContext.java:548)

        at com.vmware.bdd.service.sp.DeleteVmByIdSP.call(DeleteVmByIdSP.java:58)

        at com.vmware.bdd.service.sp.DeleteVmByIdSP.call(DeleteVmByIdSP.java:38)

        at com.vmware.aurora.composition.concurrent.Scheduler$StoredProcedureCallable.call(Scheduler.java:88)

        at com.vmware.aurora.composition.concurrent.Scheduler$StoredProcedureCallable.call(Scheduler.java:71)

        at com.vmware.aurora.composition.concurrent.PriorityThreadPoolExecutor$2.call(PriorityThreadPoolExecutor.java:186)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:744)

Caused by: com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound:

obj = com.vmware.vim.binding.vmodl.ManagedObjectReference@fd292872

inherited from com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound

        at sun.reflect.GeneratedConstructorAccessor443.newInstance(Unknown Source)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

        at java.lang.Class.newInstance(Class.java:374)

        at com.vmware.vim.vmomi.core.types.impl.ComplexTypeImpl.newInstance(ComplexTypeImpl.java:171)

        at com.vmware.vim.vmomi.core.types.impl.DefaultDataObjectFactory.newDataObject(DefaultDataObjectFactory.java:26)

        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.ComplexStackContext.<init>(ComplexStackContext.java:33)

        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.parse(UnmarshallerImpl.java:135)

        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.unmarshall(UnmarshallerImpl.java:98)

</code>

0 Kudos
stevehoward2020
Enthusiast
Enthusiast

I am also assuming this has to do with existing snapshots that can't be found for whatever reason.  The full "Caused by stack" is below...

Caused by: com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound:

obj = com.vmware.vim.binding.vmodl.ManagedObjectReference@fd292872

inherited from com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound

        at sun.reflect.GeneratedConstructorAccessor443.newInstance(Unknown Source)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

        at java.lang.Class.newInstance(Class.java:374)

        at com.vmware.vim.vmomi.core.types.impl.ComplexTypeImpl.newInstance(ComplexTypeImpl.java:171)

        at com.vmware.vim.vmomi.core.types.impl.DefaultDataObjectFactory.newDataObject(DefaultDataObjectFactory.java:26)

        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.ComplexStackContext.<init>(ComplexStackContext.java:33)

        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.parse(UnmarshallerImpl.java:135)

        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.unmarshall(UnmarshallerImpl.java:98)

        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl.unmarshalSoapFault(UnmarshallerImpl.java:84)

        at com.vmware.vim.vmomi.client.common.impl.SoapFaultStackContext.setValue(SoapFaultStackContext.java:37)

        at com.vmware.vim.vmomi.client.common.impl.ResponseUnmarshaller.unmarshal(ResponseUnmarshaller.java:97)

        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.unmarshalResponse(ResponseImpl.java:245)

        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setResponse(ResponseImpl.java:203)

        at com.vmware.vim.vmomi.client.http.impl.HttpExchange.run(HttpExchange.java:126)

        at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingImpl.send(HttpProtocolBindingImpl.java:98)

        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.sendCall(MethodInvocationHandlerImpl.java:533)

        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.executeCall(MethodInvocationHandlerImpl.java:514)

        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.completeCall(MethodInvocationHandlerImpl.java:302)

        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invokeOperation(MethodInvocationHandlerImpl.java:272)

        at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invoke(MethodInvocationHandlerImpl.java:169)

        at com.sun.proxy.$Proxy114.removeAllSnapshots(Unknown Source)

        at com.vmware.aurora.vc.VcVirtualMachineImpl$13.body(VcVirtualMachine.java:2550)

        at com.vmware.aurora.vc.VcTaskMgr.executeInternal(VcTaskMgr.java:260)

        at com.vmware.aurora.vc.VcTaskMgr.execute(VcTaskMgr.java:240)

        at com.vmware.aurora.vc.VcVirtualMachineImpl.removeAllSnapshots(VcVirtualMachine.java:2547)

        at com.vmware.aurora.vc.VcVirtualMachineImpl.removeAllSnapshots(VcVirtualMachine.java:2564)

        at com.vmware.aurora.vc.VcVirtualMachineImpl$9.exec(VcVirtualMachine.java:2316)

        at com.vmware.aurora.vc.VcVirtualMachineImpl$9.exec(VcVirtualMachine.java:2313)

        at com.vmware.aurora.vc.VcVirtualMachineImpl.safeExecVmOp(VcVirtualMachine.java:1367)

        at com.vmware.aurora.vc.VcVirtualMachineImpl.destroy(VcVirtualMachine.java:2313)

...with the following code for this piece...

public void destroy(final boolean removeSnapShot)

    throws Exception

  {

    try

    {

      safeExecVmOp(new VmOp()

      {

        public Void exec()

          throws Exception

        {

          if (removeSnapShot) {

            VcVirtualMachineImpl.this.removeAllSnapshots();

          }

          VcVirtualMachineImpl.this.destroyInt();

          return null;

        }

      });

    }

    catch (Exception e)

    {

      throw VcException.DELETE_VM_FAILED(e, getName(), e.getMessage());

    }

  }

0 Kudos
stevehoward2020
Enthusiast
Enthusiast

We ended up searching every column of every table in the VCenter database, and found the only references were in VPX_EVENT and VPX_TEXT_ARRAY.  These looked like historical transaction tables, with no metadata about the missing VM's.  As such, on a hunch we restarted the serengeti server assuming this was cache related.  When we did this, the cluster was still in the list, but the cluster delete command now worked.

0 Kudos
jessehuvmw
Enthusiast
Enthusiast

Glad to know your problem resolved.  If the resouces in VC changed (e.g. datastores/esx-hosts added/removed/renamed, etc), you have to restart serengeti service (i.e. sudo service tomcat restart) to let Serengeti be aware of the change.

Cheers, Jesse Hu
0 Kudos