VMware Cloud Community
VMKR9
Expert
Expert

Power of VM - Operation Failed Since another task is in progress

I have a new ESX3.0 environment that has been running for a few weeks now without to many issues. We are now seeing the error above when powering off some vms, is shows in virtual center as powering off at 100% but just sits there for hours and if you try to power it off again it says Operation failed since another task is in progress, the vm is locked and there is no way to get it to respond, has anyone seen this issue and know how to fix?

The vms log just displays this as the last entery:

Nov 03 07:09:23.694: vmx|

Nov 03 07:09:23.694: vmx|

Nov 03 07:09:23.694: vmx| VMXRequestReset

Nov 03 07:09:23.694: vmx| Stopping VCPU threads...

Reply
0 Kudos
47 Replies
minerat
Enthusiast
Enthusiast

I had the same problem as the op. I killed both vmkload_app pids and powered on the vm - still hangs at 95%. I vmotioned the other VMs off and rebooted the host - VM powered on successfully.

Reply
0 Kudos
minerat
Enthusiast
Enthusiast

This has happened again with the same VM. Any progress on the SR?

Fortunately issuing a vmware-cmd /vmfs/..../..vmx stop worked this time and I didn't have to vmotion everything off.

Reply
0 Kudos
Chris_Price
Contributor
Contributor

How many of you are running VMWare Tools? I ask this as I have two VM's built from the same template and running on separate blade servers. One has VMWare Tools, the other does not. The VM that does NOT have VMWare Tools is the one that's having the same problem as everyone else here.

Regards,

Chris

Message was edited by:

Chris Price

Reply
0 Kudos
VMKR9
Expert
Expert

All our vms are running vm tools. So I don't think it is that....

Reply
0 Kudos
Chris_Price
Contributor
Contributor

I didn't think it was. This issue is new to me so I'm trying to rationalize it all out. I have a feeling it's going to be an ESX issue.

Chris

Reply
0 Kudos
minerat
Enthusiast
Enthusiast

Ditto, all VMs running vmtools.

Reply
0 Kudos
jfigueroa
Contributor
Contributor

We are trying to follow up with the vendor. But we think it may be more related to how you are backing up the VMs. We are using EsXpress and it may be related to that underlying technology used for it (VBA - Virtual Backup Appliance).

I would be interested in knowing how other folks are doing their backups.

Thanks

Reply
0 Kudos
VMKR9
Expert
Expert

We are not backing up the vms at present....

Reply
0 Kudos
minerat
Enthusiast
Enthusiast

ditto, no vm backup at present. I've manually exported them once, but aside from that they don't change very frequently.

Reply
0 Kudos
Zak1
Contributor
Contributor

Thought I'd add we have this problem too and none of the above have worked with exception of rebooting the host.

Anyone know how to check the actual PID of the vmdk or vswp files?

Also depending on which PS command is used the PID keeps changing so can't kill the process.

Any ideas?

Reply
0 Kudos
look1976
Contributor
Contributor

from the host's shell:

\# ps axu | grep name_of_the_faulty_VM

\# kill -9 above_found_PID

Smiley Happy

Reply
0 Kudos
lightfighter
Enthusiast
Enthusiast

I had the same thing happen and I ended up puting in maintenance mode, which made my VM come back to life but it could not migrate to another host in the cluster, so I had to end up rebooting the server anyway

Reply
0 Kudos
whynotq
Commander
Commander

if they won't powerr off then:

ps -ax | grep "VMname"

kill -9 (returned PID)

if they won't start and it's none of the obvious (avaliable memory,disk space etc....) then:

restart the VMware Virtual Infrastructure Server service on the VC Server.

Reply
0 Kudos
lindqvist
Contributor
Contributor

Hi

I did get this issue and did kill it works fine

but when i try to start it up it returns failed to power on VM : No swap file

the file is ther i can see it but i cant delet it

so i did try to restart the VC service but this dosent help it gives me the same error when i try to start the VM

any ide?

//Johan

Reply
0 Kudos
michael_stan
Contributor
Contributor

(at the cmd prompt enter) cat /proc/vmware/vm/*/names

This lists the running VM's on the host server you are logged on to.

vmid=1069 pid=-1 cfgFile="/vmfs/volumes/45.../server1/server1.vmx" uuid="50..." displayName="server1"

vmid=1107 pid=-1 cfgFile="/vmfs/volumes/45.../server2/server2.vmx" uuid="50..." displayName="server2"

vmid=1149 pid=-1 cfgFile="/vmfs/volumes/45.../server3/server3.vmx" uuid="50..." displayName="server3"

vmid=1156 pid=-1 cfgFile="/vmfs/volumes/45.../server4/server4.vmx" uuid="50..." displayName="server4"

vmid=1170 pid=-1 cfgFile="/vmfs/volumes/45.../server5/server5.vmx" uuid="50..." displayName="server6"

vmid=1178 pid=-1 cfgFile="/vmfs/volumes/45.../server6/server6.vmx" uuid="50..." displayName="server6"

vmid=1188 pid=-1 cfgFile="/vmfs/volumes/45.../server7/server7.vmx" uuid="50..." displayName="server7"

vmid=1198 pid=-1 cfgFile="/vmfs/volumes/45.../server8/server8.vmx" uuid="50..." displayName="server8"

\[-If you are running ESX 2.5 then you can kill the vmx PID-]

If you are running ESX 3.0.x then you find group ID that controls the PID of the VM.

(at the cmd prompt enter) less -S /proc/vmware/vm/1149/cpu/status

vcpu vm type name uptime status costatus usedsec syssec wait waitsec idlesec (more...)

1149 1149 V vmm0:server3 350042.494 WAIT STOP 15968.954 518.916 COW 325800.734 322397.266 (more...)

Scroll right with the right arrow key to locate the "group" pid. In this case the group pid was 1148 (not shown in this example)

Now with the group PID you can kill the VM safely without corrupting the VM as posted earlier.

(at the cmd prompt enter) /usr/lib/vmware/bin/vmkload_app -k 9 1148

Warning: Apr 20 16:22:22.710: Sending signal '9' to world 1148.

THIS MEANS SUCCESS... if you receive another line then the process might not have been successful.

Hope this helps!

Michael Stan

Reply
0 Kudos
mkirchner
Contributor
Contributor

We started out with a VM that just would not power on. Created a new VMX and pointed to the old VMDK's. The new VM would power on just fine then in our particular situation we needed to delete the old VMX from inventory. Was unable to because we would get the Operation Failed since another task is in progess message. Used the PS -auwx |GREP command to find the PID and used KILL (PID). When attempting to delete the process indicator stopped at 95%, timed out - then Orphaned the machine. Then was able to delete the vmx from the COS.

Message was edited by:

mkirchner

Reply
0 Kudos
jakarth
Contributor
Contributor

OK I had this problem when deleting snapshots.

It would appear that a break in the snapshot chain caused the task to time out. This resulted in the error as posted above.

Attempting to vmotion the machine to another host fixed the problem even though it posted the usual "snapshots aren't supported issue".

Reply
0 Kudos
jandie
Enthusiast
Enthusiast

Just happened to me this morning, I hope it doesn't happen again. I can't seem to find the cause, but SCSI Distributed File Lock popped in my head (since every time I want to do any operation to the VM, it says another task is in progress - can't even VMotion). I'm opening an SR, will keep you posted.

Johan

Reply
0 Kudos
VMKR9
Expert
Expert

We have migrated to a brand new environment and it is still happening on a completley fresh install of 3.01. It would be good if someone could finally get an answer as to the cause and an easier solution..... fingers crossed!

Reply
0 Kudos
nickfretwell
Contributor
Contributor

I am now having this problem. It only seems to happen over the weekend when we run our esxRanger Pro backups to an external HD. I have just loggeda call for the issue.

Reply
0 Kudos