1 2 3 4 Previous Next 47 Replies Latest reply on Dec 10, 2009 2:13 AM by mr.anderson Go to original post
      • 30. Re: Power of VM - Operation Failed Since another task is in progress
        look1976 Novice

        from the host's shell:

         

        \# ps axu | grep name_of_the_faulty_VM

         

        \# kill -9 above_found_PID

         

         

        • 31. Re: Power of VM - Operation Failed Since another task is in progress
          lightfighter Enthusiast

          I had the same thing happen and I ended up puting in maintenance mode, which made my VM come back to life but it could not migrate to another host in the cluster, so I had to end up rebooting the server anyway

          • 32. Re: Power of VM - Operation Failed Since another task is in progress
            whynotq Master

            if they won't powerr off then:

             

            ps -ax | grep "VMname"

             

            kill -9 (returned PID)

             

            if they won't start and it's none of the obvious (avaliable memory,disk space etc....) then:

             

            restart the VMware Virtual Infrastructure Server service on the VC Server.

            • 33. Re: Power of VM - Operation Failed Since another task is in progress
              lindqvist Novice

              Hi

               

              I did get this issue and did kill it works fine

              but when i try to start it up it returns  failed to power on VM : No swap file

              the file is ther i can see it but i cant delet it

               

              so i did try to restart the VC service but this dosent help it gives me the same error when i try to start the VM

               

              any ide?

               

              //Johan

              • 34. Re: Power of VM - Operation Failed Since another task is in progress
                michael_stan Novice

                (at the cmd prompt enter) cat /proc/vmware/vm/*/names

                 

                This lists the running VM's on the host server you are logged on to.

                 

                vmid=1069   pid=-1     cfgFile="/vmfs/volumes/45.../server1/server1.vmx"  uuid="50..."  displayName="server1"

                vmid=1107   pid=-1     cfgFile="/vmfs/volumes/45.../server2/server2.vmx"  uuid="50..."  displayName="server2"

                vmid=1149   pid=-1     cfgFile="/vmfs/volumes/45.../server3/server3.vmx"  uuid="50..."  displayName="server3"

                vmid=1156   pid=-1     cfgFile="/vmfs/volumes/45.../server4/server4.vmx"  uuid="50..."  displayName="server4"

                vmid=1170   pid=-1     cfgFile="/vmfs/volumes/45.../server5/server5.vmx"  uuid="50..."  displayName="server6"

                vmid=1178   pid=-1     cfgFile="/vmfs/volumes/45.../server6/server6.vmx"  uuid="50..."  displayName="server6"

                vmid=1188   pid=-1     cfgFile="/vmfs/volumes/45.../server7/server7.vmx"  uuid="50..."  displayName="server7"

                vmid=1198   pid=-1     cfgFile="/vmfs/volumes/45.../server8/server8.vmx"  uuid="50..."  displayName="server8"

                 

                \[-If you are running ESX 2.5 then you can kill the vmx PID-]

                 

                If you are running ESX 3.0.x then you find group ID that controls the PID of the VM.

                 

                (at the cmd prompt enter) less -S /proc/vmware/vm/1149/cpu/status

                 

                vcpu   vm type name                uptime status   costatus       usedsec     syssec wait           waitsec       idlesec (more...)

                1149 1149 V    vmm0:server3    350042.494 WAIT     STOP         15968.954    518.916 COW         325800.734    322397.266 (more...)

                 

                Scroll right with the right arrow key to locate the "group" pid. In this case the group pid was 1148 (not shown in this example)

                 

                Now with the group PID you can kill the VM safely without corrupting the VM as posted earlier.

                 

                (at the cmd prompt enter) /usr/lib/vmware/bin/vmkload_app -k 9 1148

                 

                Warning: Apr 20 16:22:22.710: Sending signal '9' to world 1148. 

                THIS MEANS SUCCESS... if you receive another line then the process might not have been successful.

                 

                Hope this helps!

                 

                Michael Stan

                • 35. Re: Power of VM - Operation Failed Since another task is in progress
                  mkirchner Lurker

                  We started out with a VM that just would not power on. Created a new VMX and pointed to the old VMDK's. The new VM would power on just fine then in our particular situation we needed to delete the old VMX from inventory. Was unable to because we would get the Operation Failed since another task is in progess message. Used the PS -auwx |GREP command to find the PID and used KILL (PID). When attempting to delete the process indicator stopped at 95%, timed out - then Orphaned the machine.  Then was able to delete the vmx from the COS.

                   

                  Message was edited by:

                          mkirchner

                  • 36. Re: Power of VM - Operation Failed Since another task is in progress
                    jakarth Enthusiast

                    OK I had this problem when deleting snapshots.

                     

                    It would appear that a break in the snapshot chain caused the task to time out. This resulted in the error as posted above.

                     

                    Attempting to vmotion the machine to another host fixed the problem even though it posted the usual "snapshots aren't supported issue".

                    • 37. Re: Power of VM - Operation Failed Since another task is in progress
                      jandie Enthusiast

                      Just happened to me this morning, I hope it doesn't happen again.  I can't seem to find the cause, but SCSI Distributed File Lock popped in my head (since every time I want to do any operation to the VM, it says another task is in progress - can't even VMotion).  I'm opening an SR, will keep you posted.

                       

                      Johan

                      • 38. Re: Power of VM - Operation Failed Since another task is in progress
                        VMKR9 Expert

                        We have migrated to a brand new environment and it is still happening on a completley fresh install of 3.01.  It would be good if someone could finally get an answer as to the cause and an easier solution..... fingers crossed!

                        • 39. Re: Power of VM - Operation Failed Since another task is in progress
                          nickfretwell Lurker

                          I am now having this problem.  It only seems to happen over the weekend when we run our esxRanger Pro backups to an external HD.  I have just loggeda call for the issue.

                          • 40. Re: Power of VM - Operation Failed Since another task is in progress
                            jftwp Expert

                            We have esxRanger Pro as well, and backup our ESX 3.0.1 hosts (in full) using a dedicated physical[/i] server with an HBA disk connected to it.  No problems (overall; few hiccups) with Ranger, but when we've seen any snapshot-related issues such as with 'another task is in progress' that you can't end, including rarely with a VM that has an old snapshot that just isn't happy for whatever reason, we always ALWAYS have to shut down the given VM and then delete its snapshot/s.  If that doesn't work (same error even after shutting down the VM, then we vmotion VMs off the host involved, then reboot that host, then delete the snapshot/s (always works) -


                            task is 'then' no longer in progress.

                             

                            Snapshots aren't a perfect technology.  Our own internal 'best practices' is to make sure no snapshots are sitting out there on the VMFS volumes with the VMs unlesss absolutely necessary/expected.  3rd party backup solutions such as Ranger merely tap the snapshot API to do their thing with fancy scripting coupled with a decent GUI... pointing the finger at Ranger (and I don't think you are, right?) isn't going to solve the underlying technology in regards to snapshots and stability/reliability (again, in my opinion, not a perfect science/technology just yet).

                            • 41. Re: Power of VM - Operation Failed Since another task is in progress
                              jandie Enthusiast

                              Sorry to post this a little bit longer than expected.  My issues may be different than yours, but the symptoms are the same.  I do not have any VCB or ESXRanger running or any snapshot backup method running, hence I said that my issues may not be similar to the rest of yours.  However, I promised to post the reply by VMware Rep after I opened an SR, so here it is:

                               

                              "To answer your question, no this issue should not keep happening.

                               

                              Did you rebuild the VM as I mentioned in my previous email?  Did that help if

                              you did?

                               

                              Another thing you may want to watch for is to prevent your CDROMs / Floppies

                              from referencing a non-existent ISO or .flp image.  Better yet only have

                              CDROMs and Floppies "Connected" and "Start Connected" options enabled when

                              using them.  Constant and repetitive seeks to the CDROMs and Floppies when

                              they have "Connected" and / or "Start Connected" enabled, needlessly consumes

                              CPU and can eventually hang the Guest OS. 

                               

                              Generally most of the time it is possible to kill a hung VM using the

                              procedures we have already noted.  However sometimes the VM becomes

                              "orphaned" meaning the parent PID has been killed before the children PIDs.

                              Or the process becomes a "zombie".  In both of these instances I have seen

                              where it becomes necessary to reboot the host to clear the process. 

                               

                              If you use VMotion you could use it to move running VMs off the host prior to

                              rebooting it so that those VMs do not experience any downtime. 

                               

                              I have not found any reason why the VM hung, as I stated during our phone

                              conversation.  If you experience this problem again, please run the

                              vm-support script before rebooting the host so that we do not lose

                              information when the host reboots, and the process IDs are an exact

                              representation of the currently running system."

                               

                              Hope that helps a little,

                              jandie

                              • 42. Re: Power of VM - Operation Failed Since another task is in progress
                                ivanfetch Enthusiast

                                Hello,

                                 

                                We've seen this too, on a Windows Server 2003 and a Solaris 10 VM.  We're running VMware ESX Server 3.0.1 build-44686, and Virtual Center 2.0.1.  Killing the PID of the VM on the VI3 host, or using vmload_app to kill the group PID of the VM, lets us start the VM again, but there is still no clue what puts the VMs in limbo.  Since the VM shows up as powered on, the problem is not something that HA will help with.  I'll outline what we have looked at, with the hope that it will add to the troubleshooting data and maybe jogg others' ideas.

                                 

                                 

                                When VMs are in limbo, VMWare tools show "not installed," and the VM can not be powered off as "another operation is already in progress" even when none of us have performed an operation on the VM.  Perhaps the other operation is something VC has tried to do (DRS?).  Clarly the VM can not be powered off and must be killed.

                                 

                                 

                                Could this relate to a "bad host" in the cluster?  When a VM gets moved (DRS) to that host, it goes flaky?  Half the time we've seen these limbo issues, they have been on a particular host; the other half of the time I was not able to check.  I have yet to find VC logs which show which hosts a VM was migrated to by DRS, to see if this host might be involved with all the times we've had VMs go into limbo.  Where is the history of which hosts a VM has lived on over time?

                                 

                                 

                                Could this be IO related?  The Solaris VM is a Solaris jumpstart and NFS server, and the Windows VM is running Microsoft System Center Operations

                                Manager 2007 with SQL Server 2005 Enterprise Edition.  The first time the Solaris VM had this issue, we were transferring a lot of images to it over NFS.  The Windows VM does not see much action - it's there for testing and doesn't do much at the moment.

                                 

                                 

                                I have two Solaris VMs on this VMWare cluster, both VMs are running 5.10 Generic_118855-36 (64bit), and the same version of VMWare tools.  Only one of  the Solaris VMs has ended up in limbo.  The second time it was in limbo, I was able to VMotion it to other hosts, but it stayed in limbo.  We could also ping the VM, but could not SSH to it, connect to it's serial console, or anything else - this is the first time I have seen any kind of response from the guest OS, when a VM was in limbo.  I wasn't at a place where I could see it's VMWare console.  I have seen some Solaris VM issues with Sol10 before rev 11/06, or with the 32bit kernel, but they don't apply to what we're running here.  I wonder what makes the other Solaris VM (which has never gone into limbo) special?  Admitedly it doesn't do much, it's there for some samba access and other random testing.

                                 

                                 

                                We have two Windows Server 2003 Enterprise SP2 (not R2) VMs, and only one has been in limbo.  The VM which hasn't been in limbo is an application server, using IIS.   It hasn't seen a lot of action, the app is still being setup.

                                 

                                 

                                We could update this VMWare cluster to 3.0.2, and upgrade Virtual Center to 2.0.1 PL2, but I'd like to know something more concrete about this issue before tossing upgrades at the problem.

                                 

                                Has anyone else had headway or additional feedback from VMWare?  I"m about to open a case, if only to add to the "me too" list.

                                • 43. Re: Power of VM - Operation Failed Since another task is in progress
                                  shawnporter Lurker

                                  Found this in another forum...

                                   

                                  http://supportforums.vizioncore.com/forums/2/3987/ShowThread.aspx

                                   

                                  Apparently, this is a known bug and a fix is coming in the september time frame? 

                                   

                                  If you open a case with VMWare concerning this issue, reference SR# is 191595084 and you should be able to point your support guru in the right direction on this issue. 

                                   

                                  If anyone is using the beta of the fix please post.

                                  • 44. Re: Power of VM - Operation Failed Since another task is in progress
                                    harryc Enthusiast

                                     

                                    I got the same error on 2 VMs SuSE-9 (64) and SuSE-10 (64)  - Running ESX 3.01 - no snapshots or backups running. The NT guy says he had one hang like this also.

                                     

                                     

                                    Happens infrequently, I have a dozen or more SuSE VM, several dozen Windows VMs, running on a farm of 5 ESX servers.

                                     

                                     

                                    Trying the vm-support -x ( get ID ) then vmx-support -X ID

                                     

                                     

                                    Took awhile ( 6 minutes ) but in the end the VM is down !

                                     

                                     

                                     

                                     

                                    # vm-support -x
                                    VMware ESX Server Support Script 1.27
                                    Available worlds to debug:
                                    vmid=1079   aba-dc-qa
                                    vmid=1089   aba-dc-pub-dev1
                                    vmid=1107   chgbpmdb01dev
                                    vmid=1117   chgsandbox01
                                    vmid=1125   magic-dev
                                    vmid=1133   chgbpmapp01dev
                                    vmid=1143   aba-dc-dev
                                    vmid=1148   timssnet-dev
                                    vmid=1169   rdmcsdev01
                                    vmid=1179   aba-dcqa-pub1
                                    vmid=1189   SuSE10-DMZ
                                    vmid=1198   SuSE10-LAN
                                    # vm-support -X 1189
                                    VMware ESX Server Support Script 1.27

                                    Can I include a screenshot of the VM 1189? : y
                                    Can I send an NMI (non-maskable interrupt) to the VM 1189? This might crash the VM, but could aid in debugging : y
                                    Can I send an ABORT to the VM 1189? This will crash the VM, but could aid in debugging : y
                                    Preparing files: /
                                    Grabbing data & core files for world 1189. This will take 5 - 10 minutes.

                                     

                                     

                                    thx Shawn !