VMware Cloud Community
kgill
Contributor
Contributor

Corrupt redo log freezing VM

Hello,

We use an ESX4 environment with 3 host and fiber attached datastores. We have a VM that is freezing up, when I click on it in VCenter this message comes up: “Virtual Machine Message msg.hbacommon.corruptredo: The redo log of JUSBLD01SRV13-000005.vmdk is corrupted.  Power off the virtual machine.  If the problem still persists, discard the redo log.”  When I hit “ok” on this message the VM powers off and comes back up a few minutes later.  It works until the message comes back later with a different VDMK file listed as being corrupted, it’s up to the file JUSBLD01SRV13-000016.vdmk now.  We usually use VDR 2.0 to backup the VM but I’ve removed it from the jobs and it’s still happening daily, sometimes more than once a day.

When I check the settings of the VM it shows the disk pointing to JUSBLD01SRV13-000016.vdmk when it should be JUSBLD01SRV13.vdmk.  Browsing the datastore shows a bunch of files called JUSBLD01SRV13-00000x.vmdx and JUDBLD01SRV13-00000x-ctk.vmdk where x is the number 1 to 16, 32 files in all.

No snap shots are listed in the snapshot manager for the VM, I can take a new snapshot and it works.  When I try to delete all snapshots it completes successfully but the -00000x.vmdk files are still there .

I’ve tried migrating the vm to a different datastore but it fails with the error: Error caused by file [JUS_vd01_BACK] JUSBLD01SRV13/JUSBLD01SRV13-000005.vmdk

When I try and clone the machine I get a similar error.

I tried running the strand alone converter on the vm over the weekend and it failed with a network error when I tried to do a straight copy.  When I tried to change the disk size to force a block by block copy it failed with a disk error.

I also tried cloning the vmdk files though the command line by running “vmkfstools -i JUSBLD01SRV13-000016.vdmk JUSBLD01SRV13-recovered.vmdk”.  It fails with the message “Clone:54% done.  Failed to clone disk : bad file descriptor (589833)”  I tried it on most of the files in the folder and got the same error.

I’m not sure what to do with it next, any help would be appreciated.

Thanks

0 Kudos
4 Replies
a_p_
Leadership
Leadership

To get an overview, please use putty to get a full list of files in the VM's folder by running ls -lisa > filelist.txt. As a sceond step use WinSCP and download the just created filelist.txt, all the vmware*.log files as well as all the .vmdk descriptor files (the small .vmdk files without flat, delta or ctk in their names). Then compress/zip the downloaded files and attach the .zip file to a reply post.

André

0 Kudos
kgill
Contributor
Contributor

Here you go, thanks for the help!

0 Kudos
continuum
Immortal
Immortal

can you run
vmkfstools -i JUSBLD01SRV13-000004.vmdk new.vmdk ?
the snapshot 000005 is quite small - if that is the only corrupt file you could get away with excluding it from the chain


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
a_p_
Leadership
Leadership

Try to download JUSBLD01SRV13-000005-delta.vmdk from the datastore to your local system using WinSCP. If this works run the attached Excel-Macro which checks the file's metadata and shows the results in different tabs.

André

0 Kudos