Power of VM - Operation Failed Since another task ...

VMKR9 · ‎11-03-2006

I have a new ESX3.0 environment that has been running for a few weeks now without to many issues. We are now seeing the error above when powering off some vms, is shows in virtual center as powering off at 100% but just sits there for hours and if you try to power it off again it says Operation failed since another task is in progress, the vm is locked and there is no way to get it to respond, has anyone seen this issue and know how to fix?

The vms log just displays this as the last entery:

Nov 03 07:09:23.694: vmx|

Nov 03 07:09:23.694: vmx| VMXRequestReset

Nov 03 07:09:23.694: vmx| Stopping VCPU threads...

jonhutchings · ‎11-05-2006

We have had similar and in these situations we have gone to the ESX service console and restarted the VC managment agent

you can do this by typing

service mgmt-vmware restart

at the console. If this command hangs (which it does sometimes as it can't shutdown the stuck agent) you'll need to do a process listing (something like ps-elf) and look for the processes. Find the process id (pid) of the stuck agent and do kill -9 . Make sure you are confident you are going to kill the right thing though - double check.

Once you have killed the agent

service mgmt-vmware start

will get it going

Jon

VMKR9 · ‎11-06-2006

I have restarted the service and that worked ok, so now I can see the console of the offending vm but it is just frozen, if I try to power off or reset it just sits at 100% again, I have ran the ps -elf command but how do I tell which PID belongs to this vm?

M_Drost · ‎11-06-2006

Had the same problem! Just shut down the other vm's running on the machine (or vmotion where necessary) and reboot the system. The VM came back just right. still trying to find the cause. But where up and running

I know that this is rude way ..but it works.

jonhutchings · ‎11-06-2006

Glad you got a bit further. I was suggesting using the ps listing to find the pid of the managment agent, in case you couldn't stop it to usual way.

If you want to see the pids of the different vm processes an easy way is to use

ps -elf | grep vmx

However I strongly recommend you don't just kill the vm. You might want to consider, as others have suggested, vmotioning everything else off and rebooting or if you havn't tried it, using vmware-cmd /vmname.vmx

not the friendly aliases

VMKR9 · ‎11-06-2006

Yeah, I have made the mistake of running a kill -9 -1 (killed all the processes on the server...oops) I have seen other posts about killing the vm process and I have tried the vmware-cmd with no joy. I know that rebooting the server fixes it, the problem is that we don't have vmotion and the server is in constant use, I realy need to know what causes it and a better way to get the machine back up, although I am not holding my breath! Thanks for all your help so far.

jftwp · ‎11-16-2006

I'm experiencing this too. I have one VM on a given host that does not respond in any way/shape/form. No ping response, no RDP, nothing. Can't shut it down/reset it, etc. in VirtualCenter.

Environment = VC 2.0.1. & ESX 3.0.1

I did note that, just prior to trying to interact with this VM, that VirtualCenter ''Recent Tasks" showed that my backup software (esxRanger Pro 2.1) was trying to snapshot the VM continually, and failing to do so, every 10 minutes or so. Ranger backs up numerous VMs in our environment with few glitches / overall success, so I'm not necessarily pointing the finger at Ranger. Ranger uses snapshots/redo's to create hot VM backups, ultimately.

All other VMs on that same host are fine. It's just the one VM that's 'dead'---even though VC via VI3 client, ESX host via VI3 client, and VI3 web access to both VC AND ESX host show the VM as running/green.

I plan on trying what has apparently worked for others in this thread---that is to say, vmotioning the other healthy/happy VMs off the affected host and onto others that can handle the load, followed by rebooting the affected host. Will advise later -

can't do the whole laborious process for another few hours per internal change management policy/notification, blah blah blah.

jftwp · ‎11-16-2006

Further on this. I actually noticed that if I log on to the ESX host in question directly, using the VI3 client, it shows me something that logging on to VirtualCenter itself does not.

That is, Recent Tasks is showing 'vpxuser' as attempting to take a snapshot of the affected/frozen VM, and it's stuck at 0% (progress meter). You can't right-click and 'Cancel' it via the gui, so I'm going to open a support request and see about 'cancel' via the command line somehow.

jftwp · ‎11-16-2006

I ended up having to restart the ESX host with the 'dead' VM on it, like everyone else. Hmmm... still not sure how the VM ever got to this state though.

Wimo · ‎11-17-2006

We had this problem too, on a Linux VM that had been upgraded from 2.5.x.

After a while, VM support closed the case on the basis that it was a bug that was fixed in the latest version of 2.5.x...

I don't know if this actually fixed it, but after installing and configuring VMware Tools about 3 weeks ago, it hasn't happened again. It was at least a weekly event before that.

jftwp · ‎11-17-2006

This case is still open, I guess VMware still looking at the logs... although they did indicate that I would need to reboot the ESX host if all else fails. Okay, fine----as long as something good comes of it, as in ROOT CAUSE, which is still TBD.

Latest VMware tools had been up/running on this guest for several weeks prior to this hopefully very 'isolated' event.

mansof · ‎11-17-2006

I have found this problem too.

My configuration:

VI 2.0.1, 3 x ESX 3.0.1

SAN Datastore: Symetrix

GB isolated Network for vmkernel, an 2nd console access

I have 2 VM's whose Datastore is in a Symetrix. They are fine until I put the VM's under NFS stress.

Once the Windows 2K3 and the Red Hat VM's crashed the same way everybody points out.

Unfortunately I had to reboot 2 of the 3 ESX's we have, vMotioning the healthy VM's and Rebooting the ESX.

I tried again, and this time the Red Hat VM was the only one that dies.

I really would like to know the cause of this.

I saw that there is a Case opened with VMWare. Is there any Bug number I can refer to?

Thanks

VMKR9 · ‎11-18-2006

Lets hope vmware find the reason for this, we have 9 ESX servers without vmotion and I am starting to see this quite frequently across all the hosts. I am finding that killing the process for the vm works ok to save having to reboot the host. It is most frustrating though!

jftwp · ‎11-18-2006

Mansof---I only have my VMware SR# but no 'bug number' as of yet. I will follow up with the SR on Monday.

VMKR9, can you list the steps you took to kill whatever process and whether that allowed you to regain control/responsiveness of the frozen VM in question at your site? I couldn't find any in my case and also attempted to stop/reset the VM via cmd line (vmware-cmd) but it said 'another operation in progress' (something similar) so the VM was 'busy' and just plain couldn't be released/reset/stopped no matter what, thus necessitating ESX host reboot, after which it was fine.

Here's VMware engineer's initial response to me, by the way. I plan to escalate on Monday, because obviously anything that necessitates an ESX host reboot is no good and would have to be considered a fairly severe bug, no?

***********

In a ssh connection to the host, check what processes are holding that VM:

\# ps -auwx|grep stop

but if it replies that an operation is already in progress, the other option

is to reboot the host.

In that case, there might be problems restarting the VM, in which case you

could point the .vmx file to the base .vmdk, but any changes done while the

snapshot was active will be lost.

Regards

--

Fermin Echegaray

VMware Technical Support

3210 Porter Dr.

Palo Alto, CA 94304

http://www.vmware.com

admin · ‎11-18-2006

Can you please answer one question regarding this issue? When the vm fails to stop, if you do a ps -awx | grep process clear up the issue or does it still remain?

VMKR9 · ‎11-20-2006

Hi Jairam,

Would you believe it, I rebooted all our servers this weekend so don't have any that are in this "limbo" state.... probably will not take long till I get one though! Here is what I have done and it works. Do a ps -elf | grep vmx, find the one thats stuck and get its PID, and run kill -9 -\[PID]

mansof · ‎11-20-2006

I try the kill -0 procedure to my Win2k3 VM< and now I cannot started it.

"Could not power on VM: Noswap file. Failed to power on VM"

I tried before the vmware-cmd <path.vmx> stop

and I had the same error as everybody else " Operation Failed since another task is in progress"

I rebooted the esx 3.0.1 server and the VM came to life again.

I am doing more research, I am opening a case with VMWare as well. I'll keep the Community posted.

The_Ether · ‎12-05-2006

I had a similar problem caused by the Sync Driver in Windows VMs when a snapshot was created with vcbMounter, particularly under heavy file I/O.

For further details: http://theether.net/kb/100017

VMKR9 · ‎01-24-2007

Anyone have any luck with solving this? Or find out what causes it?

pwhelan · ‎02-14-2007

I have a similar issue a newly created virtual machine, no OS yet installed. I had recycled the vm a couple of times from Virtual Center trying to get a virtual floppy mounted fast enough for the boot process. After one attempt to recycle, it froze at 95%. After 10 or 15 minutes, Virtual Center reported that the operation had timed out. Now the device reports "Operation Failed Since another task is in progress"

I am on VI 3.01 and VC 2.01

I am also interested if VMWARE is working on a fix for this? Does anyone from VMWARE watch/contribute to these forums who might be able to comment?

All

Power of VM - Operation Failed Since another task is in progress