cmoorehsu
Contributor
Contributor

How do you kill a hung virtual machine in ESXi?

I have a test/dev environment setup on an ESXi host. One of the virtual machines keeps locking up, and outside of rebooting the ESXi host, I can't figure out how to kill the VM.

So, the VM is locked up, so I can't use the VM console or RDP. Remote shutdown commands via the DOS command-line don't work either. Telling the VM to shutdown from VI Client leaves the shutdown task hanging at 95% complete -- it's been stuck there for almost 8 hours now. I attempted to use `vmware-cmd ... stop hard` from the RCLI, but learned that ESXi isn't licensed to let you do that. I've accessed the (unsupported) console on ESXi, but none of the normal ESX commands I've read about are included in the base ESXi image. I tried to use the `vm-support -X ...` command, but I apparently don't have enough storage available to create the .tar file from that process. I managed to get the following process info from the command line, but I'm too chicken to kill the processes:

~ # ps -c | grep TESTSQL

3868 3868 vmx /bin/vmx -ssched.group=host/user -# name=VMware ESX Server;version=3.5.0;licensename=VMware ESX Server;licenseversion=2.0 build-153875; -@ pipe=/tmp/vmhsdaemon-0/vmx0f568522a56f55e3; /vmfs/volumes/f9d1a505-304cbd7b/TESTSQL/TESTSQL.vmx

3869 vmm0:TESTSQL

3925 3868 vmx /bin/vmx -ssched.group=host/user -# name=VMware ESX Server;version=3.5.0;licensename=VMware ESX Server;licenseversion=2.0 build-153875; -@ pipe=/tmp/vmhsdaemon-0/vmx0f568522a56f55e3; /vmfs/volumes/f9d1a505-304cbd7b/TESTSQL/TESTSQL.vmx

3926 3868 mks:TESTSQL /bin/vmx -ssched.group=host/user -# name=VMware ESX Server;version=3.5.0;licensename=VMware ESX Server;licenseversion=2.0 build-153875; -@ pipe=/tmp/vmhsdaemon-0/vmx0f568522a56f55e3; /vmfs/volumes/f9d1a505-304cbd7b/TESTSQL/TESTSQL.vmx

3927 3868 vcpu-0:TESTSQL /bin/vmx -ssched.group=host/user -# name=VMware ESX Server;version=3.5.0;licensename=VMware ESX Server;licenseversion=2.0 build-153875; -@ pipe=/tmp/vmhsdaemon-0/vmx0f568522a56f55e3; /vmfs/volumes/f9d1a505-304cbd7b/TESTSQL/TESTSQL.vmx

help!?

Environment Details:

  • VMware ESXi 3.5.0 Build 153875

  • VMware Infrastructure Client 2.5.0 Build 147633

  • Dell PowerEdge R805 with dual Quad-core AMD Opteron 2356

  • Openfiler 2.3 NFS datastore

VM Details:

  • Microsoft 2003 Server Enterprise Edition (64-bit)

  • 1 vCPU

  • 2GB RAM

  • Microsoft SQL 2005 Standard Edition

Tags (3)
0 Kudos
5 Replies
Dave_Mishchenko
Immortal
Immortal

At the console you'll have the vim-cmd

vim-cmd vmsvc/getallvms

vim-cmd vmsvc/power.off

You can also use the kill command.

cmoorehsu
Contributor
Contributor

Thanks for the tip, Dave, but it doesn't work for me:

~ # vim-cmd vmsvc/getallvms | grep TESTSQL

384 TESTSQL TESTSQL/TESTSQL.vmx winNetEnterprise64Guest vmx-04

~ # vim-cmd vmsvc/power.off 384

Powering off VM:

(vim.fault.TaskInProgress) {

dynamicType = <unset>,

task = 'vim.Task:haTask-384-vim.VirtualMachine.powerOff-2862',

msg = "Operation failed since another task is in progress."

}

The other task is the previously hung Power Off attempt made from the VI Client. You mention that `kill` will work as well, but how exactly do I go about doing that? Is there a special order in which to kill the processes? If I do kill the processes, will the system reclaim all of the old resource and remain stable? I'm looking for a "clean" way to kill this VM.

0 Kudos
timw18
Enthusiast
Enthusiast

Here are a couple of links i found on this

I haven't tried any of these methods but the links may help

cmoorehsu
Contributor
Contributor

Thanks for the tips guys. I ended up having to kill the processes with the `kill` command. Based on the `ps` results, I killed the parent process (the one with the lowest PID) without using the `-9` argument. The VM process ended immediately, and the Shutdown task in VI Client finally showed 100%. I had to delete the stale .vswp file, but the VM appears to be running okay right now.

As a sidebar, it looks like our Openfiler box is unable to keep up with the IO demand which caused the VM to lockup because it's virtual SCSI device appeared unready for IO.

0 Kudos
PeteLong
Contributor
Contributor

I know this is an old thread - but I say this today on a VM that had a failed hardware upgrade (version 4 to 7) Once I cleared that it was OK

Cannot Delete a Virtual Machine (Another task is already in progress)

Pete

0 Kudos