VMware Cloud Community
btok
Contributor
Contributor

Corrupt redolog

I have a VM that is prompting the following message:

"msg.hbacommon.corruptredo:The redolog of server1-000001.vmdk has been detected to be corrupt. The virtual machine needs to be powered on. If the problem still persists, you need to discard the redolog.

This prompt is an endless loop ... if you select ok the message continues to prompt over & over. The machine is stuck powering on. Every attempt to power off etc on this VM results in "the attempted operation cannot be performed in the current state.

To get this VM powered off the ESX server it resides on has to be cycled.

Is there any easier way to get a VM caught in this state shut down?

Reply
0 Kudos
7 Replies
enterpriseda
Contributor
Contributor

Hi i had this kind of problem too after a San copy, try to search in the forum about a ps with some options command to kill the process of the restarting VM. then remove the redo logs if i find it i'll wrote to u

Reply
0 Kudos
btok
Contributor
Contributor

Found the process

Instructions on how to foracbly terminate a VM if it is unresponsive to the VI client.

In this you will be terminating the Master World and User Worlds for the VM which in turn will terminate the VM's processes.

1. First list the running VMs to determine the VM ID for the affected VM:

#cat /proc/vmware/vm/*/names

vmid=1076 pid=-1 cfgFile="/vmfs/volumes/50823edc-d9110dd9-8994-9ee0ad055a68/vc using sql/vc using sql.vmx" uuid="50 28 4e 99 3d 2b 8d a0-a4 c0 87 c9 8a 60 d2 31" displayName="vc using sql-192.168.1.10"

vmid=1093 pid=-1 cfgFile="/vmfs/volumes/50823edc-d9110dd9-8994-9ee0ad055a68/esx_template/esx_template.vmx" uuid="50 11 7a fc bd ec 0f f4-cb 30 32 a5 c0 3a 01 09" displayName="esx_template"

For this example we will terminate the VM at vmid='1093'

2. We need to find the Master World ID, do this type:

\# less –S /proc/vmware/vm/1093/cpu/status

Expand the terminal or scroll until you can see the right-most column. This is labelled 'group'. Unterneath the column you will find: vm.1092.

In this example '1092' is the ID of the Master World.

3. Run this command to terminate the Master World and the VM running in it:

/usr/lib/vmware/bin/vmkload_app –k 9 1092

4. This should kill all the VM's User Worlds and also the VM's processes.

If Successful you will see similar:

\# /usr/lib/vmware/bin/vmkload_app --kill 9 1070

Warning: Jul 12 07:24:06.303: Sending signal '9' to world 1070.

If the Master World ID is wrong you may see:

\# /usr/lib/vmware/bin/vmkload_app --kill 9 1071

Warning: Jul 12 07:21:05.407: Sending signal '9' to world 1071.

Warning: Jul 12 07:21:05.407: Failed to forward signal 9 to cartel 1071: 0xbad0061

Reply
0 Kudos
mrupright
Contributor
Contributor

In your earlier post, what are the special characters "–S and –k"

What are the special characters in the folling commands?

  1. less –S /proc/vmware/vm/1093/cpu/status

/usr/lib/vmware/bin/vmkload_app –k 9 1092

2. We need to find the Master World ID, do this type:

  1. less –S /proc/vmware/vm/1093/cpu/status

/usr/lib/vmware/bin/vmkload_app –k 9 1092

4. This should kill all the VM's User Worlds and also the VM's processes.

If Successful you will see similar:

  1. /usr/lib/vmware/bin/vmkload_app --kill 9 1070

Warning: Jul 12 07:24:06.303: Sending signal '9' to world 1070.

Reply
0 Kudos
mrbill007
Contributor
Contributor

THANK YOU!

This worked perfectly.

Reply
0 Kudos
Cybershift
Contributor
Contributor

I had the same issue and this worked for me as well. Thanks!

BTW, the special symbol in the commands is a dash ( - )

Reply
0 Kudos
anandv
Contributor
Contributor

Thanks for the solution worked for me as well.

Reply
0 Kudos
anuchit2402
Contributor
Contributor

Hello btok

I have same problem but when i run command on step 2 it not show ID master word but it show "No such file or directory".

can you help me to advice fix this problem?

Reply
0 Kudos