Hi All,
We are still in the testing phase. While deleting a cluster, we had an EMC storage issue that caused the cluster delete to fail. Since then, it looks like we have some corrupt metadata, as the storage issue has been resolved, but the cluster delete fails.
See below...
<pre>
serengeti>cluster delete --name CustomerHub
STARTED 10%
node group: worker2, instance number: 0
roles:[basic]
node group: worker3, instance number: 0
roles:[basic]
node group: worker4, instance number: 0
roles:[basic]
node group: worker5, instance number: 0
roles:[basic]
node group: worker1, instance number: 1
roles:[basic]
NAME IP STATUS TASK
--------------------------------------------
CustomerHub-worker1-5 Powered Off
node group: worker6, instance number: 1
roles:[basic]
NAME IP STATUS TASK
--------------------------------------------
CustomerHub-worker6-0 Powered Off
node group: worker7, instance number: 0
roles:[basic]
node group: master, instance number: 0
roles:[basic]
node group: worker8, instance number: 0
roles:[basic]
FAILED 60%
node group: worker2, instance number: 0
roles:[basic]
node group: worker3, instance number: 0
roles:[basic]
node group: worker4, instance number: 0
roles:[basic]
node group: worker5, instance number: 0
roles:[basic]
node group: worker1, instance number: 1
roles:[basic]
NAME IP STATUS TASK
--------------------------------------------
CustomerHub-worker1-5 Powered Off
node group: worker6, instance number: 1
roles:[basic]
NAME IP STATUS TASK
--------------------------------------------
CustomerHub-worker6-0 Powered Off
node group: worker7, instance number: 0
roles:[basic]
node group: master, instance number: 0
roles:[basic]
node group: worker8, instance number: 0
roles:[basic]
The failed nodes: 2
----------------------------------------------------------------------------
[NAME] CustomerHub-worker1-5
[STATUS] Powered Off
[Error Message] [2014-12-15T13:50:50.886+0000] Failed to delete VM CustomerHub-worker1-5 for: null
----------------------------------------------------------------------------
[NAME] CustomerHub-worker6-0
[STATUS] Powered Off
[Error Message] [2014-12-15T13:50:50.889+0000] Failed to delete VM CustomerHub-worker6-0 for: null
----------------------------------------------------------------------------
cluster CustomerHub delete failed: Failed to delete virtual machine for cluster CustomerHub.
serengeti>
</pre>
Is there a way to force a cluster to be deleted? I know we can delete the VM's but the metadata would still be in the VSphere database (or at least we think it would).
Thanks,
Steve
We ended up searching every column of every table in the VCenter database, and found the only references were in VPX_EVENT and VPX_TEXT_ARRAY. These looked like historical transaction tables, with no metadata about the missing VM's. As such, on a hunch we restarted the serengeti server assuming this was cache related. When we did this, the cluster was still in the list, but the cluster delete command now worked.
Below is the stack in the log...
<code>
[2014-12-15T14:43:43.088+0000] ERROR SimpleAsyncTaskExecutor-12| com.vmware.bdd.service.impl.ClusteringService: Failed to delete VM CustomerHub-worker1-5
com.vmware.aurora.exception.VcException: Failed to delete VM CustomerHub-worker1-5 for: null
at com.vmware.aurora.exception.VcException.DELETE_VM_FAILED(VcException.java:101)
at com.vmware.aurora.vc.VcVirtualMachineImpl.destroy(VcVirtualMachine.java:2323)
at com.vmware.aurora.vc.VcVirtualMachineImpl.destroy(VcVirtualMachine.java:2307)
at com.vmware.bdd.service.sp.DeleteVmByIdSP$1.body(DeleteVmByIdSP.java:71)
at com.vmware.bdd.service.sp.DeleteVmByIdSP$1.body(DeleteVmByIdSP.java:58)
at com.vmware.aurora.vc.vcservice.VcContext.inVcSessionDo(VcContext.java:548)
at com.vmware.bdd.service.sp.DeleteVmByIdSP.call(DeleteVmByIdSP.java:58)
at com.vmware.bdd.service.sp.DeleteVmByIdSP.call(DeleteVmByIdSP.java:38)
at com.vmware.aurora.composition.concurrent.Scheduler$StoredProcedureCallable.call(Scheduler.java:88)
at com.vmware.aurora.composition.concurrent.Scheduler$StoredProcedureCallable.call(Scheduler.java:71)
at com.vmware.aurora.composition.concurrent.PriorityThreadPoolExecutor$2.call(PriorityThreadPoolExecutor.java:186)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound:
obj = com.vmware.vim.binding.vmodl.ManagedObjectReference@fd292872
inherited from com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound
at sun.reflect.GeneratedConstructorAccessor443.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:374)
at com.vmware.vim.vmomi.core.types.impl.ComplexTypeImpl.newInstance(ComplexTypeImpl.java:171)
at com.vmware.vim.vmomi.core.types.impl.DefaultDataObjectFactory.newDataObject(DefaultDataObjectFactory.java:26)
at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.ComplexStackContext.<init>(ComplexStackContext.java:33)
at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.parse(UnmarshallerImpl.java:135)
at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.unmarshall(UnmarshallerImpl.java:98)
</code>
I am also assuming this has to do with existing snapshots that can't be found for whatever reason. The full "Caused by stack" is below...
Caused by: com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound:
obj = com.vmware.vim.binding.vmodl.ManagedObjectReference@fd292872
inherited from com.vmware.vim.binding.vmodl.fault.ManagedObjectNotFound
at sun.reflect.GeneratedConstructorAccessor443.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:374)
at com.vmware.vim.vmomi.core.types.impl.ComplexTypeImpl.newInstance(ComplexTypeImpl.java:171)
at com.vmware.vim.vmomi.core.types.impl.DefaultDataObjectFactory.newDataObject(DefaultDataObjectFactory.java:26)
at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.ComplexStackContext.<init>(ComplexStackContext.java:33)
at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.parse(UnmarshallerImpl.java:135)
at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl$UnmarshallSoapFaultContext.unmarshall(UnmarshallerImpl.java:98)
at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.UnmarshallerImpl.unmarshalSoapFault(UnmarshallerImpl.java:84)
at com.vmware.vim.vmomi.client.common.impl.SoapFaultStackContext.setValue(SoapFaultStackContext.java:37)
at com.vmware.vim.vmomi.client.common.impl.ResponseUnmarshaller.unmarshal(ResponseUnmarshaller.java:97)
at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.unmarshalResponse(ResponseImpl.java:245)
at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setResponse(ResponseImpl.java:203)
at com.vmware.vim.vmomi.client.http.impl.HttpExchange.run(HttpExchange.java:126)
at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingImpl.send(HttpProtocolBindingImpl.java:98)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.sendCall(MethodInvocationHandlerImpl.java:533)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.executeCall(MethodInvocationHandlerImpl.java:514)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.completeCall(MethodInvocationHandlerImpl.java:302)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invokeOperation(MethodInvocationHandlerImpl.java:272)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invoke(MethodInvocationHandlerImpl.java:169)
at com.sun.proxy.$Proxy114.removeAllSnapshots(Unknown Source)
at com.vmware.aurora.vc.VcVirtualMachineImpl$13.body(VcVirtualMachine.java:2550)
at com.vmware.aurora.vc.VcTaskMgr.executeInternal(VcTaskMgr.java:260)
at com.vmware.aurora.vc.VcTaskMgr.execute(VcTaskMgr.java:240)
at com.vmware.aurora.vc.VcVirtualMachineImpl.removeAllSnapshots(VcVirtualMachine.java:2547)
at com.vmware.aurora.vc.VcVirtualMachineImpl.removeAllSnapshots(VcVirtualMachine.java:2564)
at com.vmware.aurora.vc.VcVirtualMachineImpl$9.exec(VcVirtualMachine.java:2316)
at com.vmware.aurora.vc.VcVirtualMachineImpl$9.exec(VcVirtualMachine.java:2313)
at com.vmware.aurora.vc.VcVirtualMachineImpl.safeExecVmOp(VcVirtualMachine.java:1367)
at com.vmware.aurora.vc.VcVirtualMachineImpl.destroy(VcVirtualMachine.java:2313)
...with the following code for this piece...
public void destroy(final boolean removeSnapShot)
throws Exception
{
try
{
safeExecVmOp(new VmOp()
{
public Void exec()
throws Exception
{
if (removeSnapShot) {
VcVirtualMachineImpl.this.removeAllSnapshots();
}
VcVirtualMachineImpl.this.destroyInt();
return null;
}
});
}
catch (Exception e)
{
throw VcException.DELETE_VM_FAILED(e, getName(), e.getMessage());
}
}
We ended up searching every column of every table in the VCenter database, and found the only references were in VPX_EVENT and VPX_TEXT_ARRAY. These looked like historical transaction tables, with no metadata about the missing VM's. As such, on a hunch we restarted the serengeti server assuming this was cache related. When we did this, the cluster was still in the list, but the cluster delete command now worked.
Glad to know your problem resolved. If the resouces in VC changed (e.g. datastores/esx-hosts added/removed/renamed, etc), you have to restart serengeti service (i.e. sudo service tomcat restart) to let Serengeti be aware of the change.