HA restart of VMs fails if VM has been storage-vmo...

iceman76 · ‎12-02-2009

Hi all,

today we ran into some trouble with our 3-node vSphere 4.0 Cluster. Due to a netork failure one node was getting isolated. Because of this isolation, the vm's on this host were stopped. So far - so good. But only some of the vm's were restarted on the remaining 2 hosts. Here are the logs from the host which tried to restart the vm

2009-12-02 10:20:40.880 F62E9B90 info 'TaskManager' Task Created : haTask-ha-folder-vm-vim.Folder.registerVm-1853759376

2009-12-02 10:20:40.880 F62E9B90 info 'ha-folder-vm'] Register called: [/vmfs/volumes/484d002e-6b5cea72-25b4-001e0bd1b6ca/PLESK-02/PLESK-02.vmx

2009-12-02 10:20:40.885 F62E9B90 info 'VMFileChecker' Config rules file '/etc/vmware/configrules' loaded and parsed successfully.

2009-12-02 10:20:40.886 F62E9B90 warning 'Vmsvc' RegisterVm file check error: IO error

2009-12-02 10:20:40.888 F62E9B90 info 'App' AdapterServer caught exception: vim.fault.NotFound

2009-12-02 10:20:40.888 F62E9B90 info 'TaskManager' Task Completed : haTask-ha-folder-vm-vim.Folder.registerVm-1853759376 Status error

2009-12-02 10:20:40.888 F62E9B90 info 'Vmomi' Activation N5Vmomi10ActivationE:0x5b538db8 : Invoke done registerVm on vim.Folder:ha-folder-vm

2009-12-02 10:20:40.888 F62E9B90 verbose 'Vmomi' Arg path:

"[]/vmfs/volumes/484d002e-6b5cea72-25b4-001e0bd1b6ca/PLESK-02/PLESK-02.vmx"

2009-12-02 10:20:40.888 F62E9B90 verbose 'Vmomi' Arg name:

2009-12-02 10:20:40.888 F62E9B90 verbose 'Vmomi' Arg asTemplate:

false

2009-12-02 10:20:40.888 F62E9B90 verbose 'Vmomi' Arg pool:

'vim.ResourcePool:ha-root-pool'

2009-12-02 10:20:40.888 F62E9B90 verbose 'Vmomi' Arg host:

'vim.HostSystem:ha-host'

2009-12-02 10:20:40.888 F62E9B90 info 'Vmomi' Throw vim.fault.NotFound

2009-12-02 10:20:40.888 F62E9B90 info 'Vmomi' Result:

(vim.fault.NotFound) {

dynamicType = <unset>,

faultCause = (vmodl.MethodFault) null,

msg = "",

}

The host can't find the VM configuration - and that is true, because it is looking at the wrong place. This VM (and all the ones which couldn't be restarted) had been moved to another storage system with storage vmotion two weeks ago. But it looks like none of the other host is the cluster noticed that change.

After we had the isolated host back to the cluster, we were able to start the affected vm's manually. Now the correct path was used. Here is the log file (from the same host like the first logfile)

2009-12-02 11:07:54.263 F62E9B90 info 'TaskManager'-- Task Created : haTask-ha-folder-vm-vim.Folder.registerVm-1853761779

2009-12-02 11:07:54.263 F62E9B90 info 'ha-folder-vm'] Register called: [--/vmfs/volumes/4a69621c-5a16699a-4427-001e0bd1b6ca/PLESK-02/PLESK-02.vmx

2009-12-02 11:07:54.290 F62E9B90 info 'VMFileChecker'-- Config rules file '/etc/vmware/configrules' loaded and parsed successfully.

2009-12-02 11:07:54.291 F62E9B90 info 'VMFileChecker'-- VM config file '/vmfs/volumes/4a69621c-5a16699a-4427-001e0bd1b6ca/PLESK-02/PLESK-02.vmx' already belongs to uid 0. Returning.

Has anyone experienced this behaviour too ? But most important : how to avoid it ? We have some vm's that have been storage v-motioned and we would like to see them restarting when a host isolation occurs.

Any help or hints are appreciated.

Thanks

admin · ‎12-02-2009

This is a bug in vSphere 4.0 that has been fixed in 4.1 (will be released soon I think). To workaround the problem in 4.0 you can try suspending and resuming the vms after they have been storage vmotioned.

Elisha

admin · ‎12-02-2009

Another workaround is to reconfigure HA on the host with storage vmotioned vms. Both workarounds will force HA to update its vm information to use the correct config path.

admin · ‎12-02-2009

Correction: the fix is in 4.0 update 1

iceman76 · ‎12-02-2009

Hi,

i've been looking through tehe release notes of Update 1, but i am not able to find this issue.

I'm going to try the work-around with reconfiguring ha on the other hosts.

Thank you

admin · ‎12-03-2009

Yeah, not all bug fixes are mentioned in the release notes, but this one was fixed for U1.

All

HA restart of VMs fails if VM has been storage-vmotioned