GlenB
Contributor
Contributor

Corrupt redolog but no snapshots

I seem to have a messed up WinXP Guest. I can boot it OK and it runs OK ... for a while. In the boot up process it hangs at 95% for an uncommonly long time before showing the Windows logo and the cylon flashing lights. But it does boot and runs for an indeterminate time. I do different things and eventually I get the VM error message:

msg.hbacommon.corruptredo:The redolog of -000001.vmdk has been detected to be corrupt. The virtual machine needs to be powered off. If the problem still persists, you need to discard the redolog.

And the machine is dead. I click on OK and the guest powers off. I can immediately reboot it and the behaviour continues.

I'd love to "discard the redolog" but I don't think I have one. The SnapShot Manager says I have no snapshots, nothing to delete. So now what?

PS - everyone seems to equate this error message with some "out of storage" problem, but the datastore containing this VM is on a local RAID array and the datastore has 570 Gb free, so that's not it.






Regards - Glen

Regards - Glen
Tags (3)
0 Kudos
19 Replies
AndreTheGiant
Immortal
Immortal

Try to create a new snapshot and then choose delete all from snap manager.

Andre

Andre | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
petedr
Virtuoso
Virtuoso

Do you see a snapshot file on disk ( a -delta file ) in the VMs folder.

As Andre indicated first try to add a new snap and then delete all snapshots

If that fails then you may want to contact Vmware support

www.phdvirtual.com, makers of esXpress

www.thevirtualheadline.com www.liquidwarelabs.com
0 Kudos
GlenB
Contributor
Contributor

There is definitely no -delta file visible when I used the datastore browser from Virtual Center. I don't know enough to say whether such a file would be visible from there, or whether VMware "hides" it and I'd have to look through a Linux command to see it.

I did create a Snapshot - no complaints were issued. I could see the snapshot file in the datastore browser. I have just deleted the Snapshot and we'll have to wait and see if there are any complaints or any subsequent redolog issues. I'll post again when I know more.




Regards - Glen

Regards - Glen
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

And if there is delta disk as per what peterdr suggested, you may need to modify the parentCID manually :

- Verify delta disk as per peterdr suggestion (000001.vmdk) --> From service console, from ESX where there VM is running from, ($ls -lah)

- Create new snapshot (000002.vmdk)

- Review parentCID for new snapshot

- Modify parentCID fofr new snapshot so that it's referring to first delta (000001.vmdk) parentCID

p/s : You can try above method to remove the snapshot but since delta 000001.vmdk probably already corrupted, please backup first your VM






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
0 Kudos
GlenB
Contributor
Contributor

I told it to remove all snapshots at 9:51 and at 10:15, the Guest machine powered itself off. The message that I saw briefly before it disappeared said that the "delete all snapshots" operation timed out. If I look at the Snapshot manager now there is no Snapshot listed. The guest has 2 disks according to the Edit Settings panel:

Hard Disk 1: Main/Main-000003.vmdk (100 Gb)

Hard Disk 2: Main/Main_2-000002.vmdk (256 Gb)

Looking at the datastore browser:

Main.vmdk

Main_1.vmdk

Main_2.vmdk

Main_2-000001.vmdk

Main_2-000002.vmdk

Main-000001-delta.vmdk

Main-000003.vmdk

Main-Snapshot3.vmsn

Of those, Main_1.vmdk has a datestamp a few days old, so I don't think it is actually being used anymore, which confirms what the Edit Settings said. Think I can delete it? It's a 256 Gb file so I'd like to recover the space Regards - Glen

Regards - Glen
0 Kudos
GlenB
Contributor
Contributor

Output from: /vmfs/volumes/4ab6c4ff-de1e6ea7-d316-0024e8734364/Main # ls -al -rw------- 1 root root 18253611008 Apr 4 01:56 Main-000001-delta.vmdk -rw------- 1 root root 50539008 Apr 4 02:15 Main-000003-delta.vmdk -rw------- 1 root root 218 Apr 4 02:15 Main-000003.vmdk -rw------- 1 root root 19418 Apr 4 01:51 Main-Snapshot3.vmsn -rw------- 1 root root 107389255680 Apr 4 02:15 Main-flat.vmdk -rw------- 1 root root 399 Apr 4 01:51 Main.vmdk -rw------- 1 root root 8684 Apr 4 02:15 Main.nvram -rw------- 1 root root 560 Apr 4 02:15 Main.vmsd -rwxr-xr-x 1 root root 2417 Apr 4 01:51 Main.vmx -rw------- 1 root root 259 Apr 2 23:23 Main.vmxf -rw------- 1 root root 274877906944 Feb 13 18:48 Main_1-flat.vmdk -rw------- 1 root root 401 Apr 1 05:22 Main_1.vmdk -rw------- 1 root root 9345435648 Apr 4 01:51 Main_2-000001-delta.vmdk -rw------- 1 root root 222 Apr 3 14:44 Main_2-000001.vmdk -rw------- 1 root root 252184576 Apr 4 02:15 Main_2-000002-delta.vmdk -rw------- 1 root root 229 Apr 4 01:51 Main_2-000002.vmdk -rw------- 1 root root 274877906944 Apr 3 14:44 Main_2-flat.vmdk -rw------- 1 root root 401 Apr 3 13:51 Main_2.vmdk -rw-rr 1 root root 21235 Apr 4 02:39 vmware.log

Analysis of the linkages of the disk files:

Main.vmx:

scsi0:0.filename="Main-000003.vmdk" (exists)

scsi0:1.filename="Main_2-000002.vmdk" (exists)

Main.vmdk: (this looks OK)

CID=1917c538

parentCID=ffffffff

Extent "Main-flat.vmdk" (exists)

Main-000003.vmdk: (pointed to by .vmx but parent does not exist)

CID=9bd507c2

parentCID=358a55d6

parentFileNameHint="Main.vmdk" (exists)

Extent "Main-000003-delta.vmdk" (exists)

Main_1.vmdk: (looks OK, but not referenced in the .vmx file so probably leftover turds)

CID=668f77d0

parentCID=ffffffff

Extent "Main_1-flat.vmdk" (exists)

Main_2.vmdk: (this looks OK)

CID=3af6bcae

parentCID=ffffffff

Extent "Main_2-flat.vmdk" (exists)

Main_2-000001.vmdk: (looks correctly linked to the parent)

CID=1660e0c8

parentCID=3af6bcae

parentFileNameHint="Main_2.vmdk" (exists)

Extent "Main_2-000001-delta.vmdk" (exists)

Main_2-000002.vmdk: (looks correctly linked to the parent)

CID=9cb51666

parentCID=1660e0c8

parentFileNameHint="Main_2-000001.vmdk" (exists)

Extent "Main_2-000002-delta.vmdk" (exists)

My Conclusions:

1) Main_2 is all OK and linked in correctly

2) Main is severely troubled:

a) .vmx points to Main-000003.vmdk which exists BUT

b) Main-000003.vmdk has a parentCID that points to no file, even though the Hint is correct

c) one would presume that Main-000003 would point to -000002 then that points to -000001 then to Main

but there is no -000002 and the Main-000001.vmdk does not exist.

d) and what is the Main-Snapshot3.vmsn good for?

So, what manual edits can I make to these text files to relink everything correctly? I don't know how it got like this, but all I have tried to do to fix it is to take snapshots and delete them, and the deletes have failed. I am prepared to hear that I will lose changes made to that disk associated with -000001 and -000003. I'd like to merge the -000003 back into -000001 and that back into Main if that's possible. And if this all blows up, I guess I'm no worse off than I am today because the last message I got said:

Error: Cannot open the disk '/vmfs/ ...... /Main/Main-000003.vmdk' or one of the snapshot disks it depends on.

Reason: the parent virtual disk has been modified since the child was created.

And then it fails to power on the guest machine, so I'm dead in the water!



Regards - Glen

Regards - Glen
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

So, let me summarize for u :

Main.vmdk

- Main-000001-delta.vmdk & NO Main-000001.vmdk

- Main-000003-delta.vmdk & Main-000003.vmdk

Main_1.vmdk

- Left over vmdk

Main_2.vmdk

- Main_2-000001-delta.vmdk & Main_2-000001.vmdk

- Main_2-000002-delta.vmdk & Main_2-000002.vmdk

- Correct parentCID

What you can do now :

1) Recreate a new file descriptor for Main-00001.vmdk

2) Edit and pointing parentCID Main-00003.vmdk to new Main-00001.vmdk

3) Create new snapshot and then delete all.

Notes :

- I do not know why you don't have Main-00002-delta.vmdk & Main-00002.vmdk, assuming you don't have or lost it edi.

- google sanbarrow thread on how to replace/recreate new descriptor file if you don't know how to do it.






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
GlenB
Contributor
Contributor

athlon_crazy: Because I'm a nervous fellow, I'd appreciate your cold eyes look at my changes BEFORE I try to snapshot or power on the machine:

Main.vmx

#NOTE - I just kept the lines relating to the HDDs

scsi0.present = "true"

scsi0.sharedBus = "none"

snapshot.action = "keep"

scsi0:0.present = "true"

scsi0:0.fileName = "Main-000003.vmdk"

scsi0:0.deviceType = "scsi-hardDisk"

scsi0:0.redo = ""

sched.scsi0:0.shares = "normal"

scsi0:1.present = "true"

scsi0:1.fileName = "Main_2-000002.vmdk"

scsi0:1.deviceType = "scsi-hardDisk"

scsi0:1.redo = ""

Main-000003.vmdk

#Disk DescriptorFile

version=1

CID=9bd507c2

parentCID=358a55d6

createType="vmfsSparse"

parentFileNameHint="Main-000001.vmdk"

#Extent description

RW 209744640 VMFSSPARSE "Main-000003-delta.vmdk"

#The Disk Data Base

#DDB

Main-000001.vmdk

#Disk DescriptorFile

version=1

CID=358a55d6

parentCID=1917c538

createType="vmfsSparse"

parentFileNameHint="Main.vmdk"

#Extent description

RW 209744640 VMFSSPARSE "Main-000001-delta.vmdk"

#The Disk Data Base

#DDB

Main.vmdk

#Disk DescriptorFile

version=1

CID=1917c538

parentCID=ffffffff

createType="vmfs"

#Extent description

RW 209744640 VMFS "Main-flat.vmdk"

#The Disk Data Base

#DDB

ddb.adapterType = "buslogic"

ddb.geometry.sectors = "63"

ddb.geometry.heads = "255"

ddb.geometry.cylinders = "13056"

ddb.uuid = "60 00 C2 96 74 87 b8 dc-73 82 af e3 7d 47 c3 40"

ddb.virtualHWVersion = "4"

ddb.toolsVersion = "7303"


Regards - Glen

Regards - Glen
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

- Main.vmx looks okay to me

- Main-000003.vmdk looks okay too & same goes to Main-000001.vmdk which correctly referring to Main.vmdk

Instead take a new snapshot, just try to power-on the VM. At least, when something goes wrong, there nothing much changes on the descriptor file.






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
0 Kudos
GlenB
Contributor
Contributor

Power on the VM, it got to abotu 40% and then I received Error message:

Cannot open the disk '/vmfs/volumes/..../Main/Main-000003.vmdk' or one of the snapshot disks it depends on.

Reason: data corruption detected.

So I may now have it all linked together correctly, but one of the -delta files is "corrupted", whatever that means. Do you agree? Any suggestions now? Any more data I can provide that would help figure it out?



Regards - Glen

Regards - Glen
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

I'm suspecting Main-000001.vmdk is the culprit.

Try modify Main-000003.vmdk parentCID by pointing it straightly to Main.vmdk parentCID instead Main-000001.vmdk. Then, try power-on the VM.






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
0 Kudos
GlenB
Contributor
Contributor

Power on succeeded. But I never saw the Windows logo and the VM alarms on high CPU use. The VM Tools never started running. Sounds like it logistically gets past linking all the disk parts together but the resulting image is not bootable or at least doesn't run properly. I powered it off and on again to see if anything got automatically fixed by the first attempt, but no change.

Next step I think is to drop all the -00000x bits by making the Main.vmx point to the Main.vmdk directly. That ought to boot and run, but it will have lost a lot of changes. No harm in trying. I guess I'll have to reapply program changes and patches. Might be better to restart from a clean template.

Any thoughts?



Regards - Glen

Regards - Glen
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

Having lost some changes is not the worse case. Duplicate this VM *vmdk plus the snapshot images for future uses & play around with it.






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
0 Kudos
GlenB
Contributor
Contributor

We tried - thanks for your help. I think I'm giving up. I can't fiddle forever and I need this machine running Monday or Tuesday at the latest.

All the data that I care about is on the Main_2 disk and that one looks OK. I did back up a few things from Main.* when it was still kind of functional. I'm going to create a new VM from my template and then bring that Main_2 disk in to it. Then I can reload all the programs and patches and end up with a clean system again.

Is there some way to "condense" all the Main_2.* files down to just a Main_2.vmdk and Main_2-flat.vmdk? And whether I can do that or not, what's the "right" way to start a new clean VM from template and then "add" that disk into it (copy or move or whatever, using the Datastore browser or Unix or whatever)?



Regards - Glen

Regards - Glen
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

I think, just enough you copy the *-flat.vmdk to your new VM folder, edit setting, add new virtual disk and choose existing one from your new VM folder. You can browse datastore & copy *-flat.vmdk to new VM folder or using unix command :

$cp -p /vmfs/volumes/.../old/main_2-flat.vmdk /vmfs/volumes/.../new/main_2-flat.vmdk






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
0 Kudos
GlenB
Contributor
Contributor

OK, but perhaps I was not clear. At the present time, in the VM that cannot start, there is a data disk and the contents of it are represented in Linux as:

Main_2.vmdk (401 bytes)

Main_2-flat.vmdk (~256 Gb)

Main_2-000001.vmdk (222 bytes)

Main_2-000001-delta.vmdk (~ 9 Gb)

Main_2-000002.vmdk (252 bytes)

Main_2-000002-delta.vmdk (~ 252 Mb)

I have confidence that the contents of this disk are not corrupted and I wanted to move it and reuse it. I certainly can move Main_2*.vmdk to a new VM directory. But there are 2 more things I'd like to do:

1 - condense all of this into just 2 files - Main_2.vmdk and Main_2-flat.vmdk

2 - rename all of this from Main_2 to something else that suits the new destination better

I guess I could accompliosh all that with some Linux commands and editting the textual vmdk's, but I was hoping the VI center had a more elegant way of accomplishing the same thing.



Regards - Glen

Regards - Glen
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

If everything was fine now(parentCID & etc), you should able to delete all snapshot for Main_2.vmdk and remove the delta files. Then just bring over Main_2-flat.vmdk (actual disk) to new VM folder. just forget about others.






vcbMC-1.0.6 Beta

vcbMC-1.0.7 Lite

http://www.no-x.org

http://www.no-x.org
0 Kudos
GlenB
Contributor
Contributor

I cannot "delete all" snapshots because the VM (that will not power on) thinks there are no snapshots (according to the VI Center Snapshot Manager). That is what I meant about "collapsing" all the deltas back into the root disk. I expect there is data in those deltas that are updates to the root and I want those updates.

At the moment, I have created a new VM with just a C: drive, then I copied (not moved) all the Main_2*.vmdk over to the new VM's directory, renamed all those Main_2 files tp User_1 and corrected the textual vmdk files to refer to the right filenames. I did not change the CIDs. Then I added a disk to the VM through Edit Settings. Now the vmx file points to the new User_1.vmdk. I'm wondering if I should edit that to point it at User_1-000002.vmdk instead so that it picks up all the changes in the -000001 and -000002 deltas.

Your opinion?



Regards - Glen

Regards - Glen
0 Kudos
GlenB
Contributor
Contributor

It happened again. Another domestic power interruption and another corrupt redolog. It seems like the VMware code is badly enough designed that it leaves open such windows. It should come with a product warning that says "do not use unless you have 100% backup power". Every couple of months I am wasting hours or days rebuilding damaged machines.

So, to start with the simple stuff, I tried to create a snapshot and then delete all snapshots. Creating was no problem. Deleting all was a problem. It got to 95% and timed out in 15 minutes. I tried twice. The snapshot manager first thought the snapshot was still there, but after the second attempt it appears to be gone. Restarting the guest gets started OK then issues the corrupt redolog message again and cancels.

So what does the vmfs think is there?

/vmfs/volumes/4ab6c4ff-de1e6ea7-d316-0024e8734364/User # ls -al

drwxr-xr-x 1 root root 3220 Jul 1 16:20 .

drwxr-xr-t 1 root root 3500 May 7 22:27 ..

-rw------- 1 root root 107389255680 Jul 1 16:17 User-flat.vmdk

-rw------- 1 root root 8684 Jul 1 16:11 User.nvram

-rw------- 1 root root 399 Jul 1 16:12 User.vmdk

-rw------- 1 root root 482 Jul 1 15:25 User.vmsd

-rwxr-xr-x 1 root root 2416 Jul 1 16:11 User.vmx

-rw------- 1 root root 259 Apr 5 05:06 User.vmxf

-rw------- 1 root root 20451952640 Jul 1 16:05 User_1-000001-delta.vmdk

-rw------- 1 root root 248 Jul 1 15:41 User_1-000001.vmdk

-rw------- 1 root root 216711825408 Jul 1 15:41 User_1-000002-delta.vmdk

-rw------- 1 root root 255 Jul 1 15:04 User_1-000002.vmdk

-rw------- 1 root root 17303552 Jul 1 16:16 User_1-000003-delta.vmdk

-rw------- 1 root root 255 Jul 1 16:12 User_1-000003.vmdk

-rw------- 1 root root 274877906944 Apr 3 14:44 User_1-flat.vmdk

-rw------- 1 root root 401 Jul 1 15:41 User_1.vmdk

-rw-rr 1 root root 33607 Jul 1 14:29 vmware-10.log

-rw-rr 1 root root 33109 Jul 1 15:22 vmware-11.log

-rw-rr 1 root root 32550 Jun 5 03:05 vmware-6.log

-rw-rr 1 root root 40331 Jun 10 04:58 vmware-7.log

-rw-rr 1 root root 360448 Jul 1 03:38 vmware-8.log

-rw-rr 1 root root 31845 Jul 1 11:38 vmware-9.log

-rw-rr 1 root root 33914 Jul 1 16:20 vmware.log

So that indicates a disk structure that looks like this:

User.vmx

C: User.vmdk + User-flat.vmdk (100 Gb)

😧 User_1.vmdk + User_1-flat.vmdh (256 Gb)

+ -000001.vmdk and -delta ( 20 Gb)

+ -000002.vmdk and -delta (210 Gb)

+ -000003.vmdk and -delta ( 17 Mb)

One of the deltas appears to be corrupted - how do I know which one?

I can edit the vmdk files to relink around the damaged one, but I lose a lot of edits in the process, don't I? Some of those I can probably recover from backups, but I'll never know for sure if I've lost anything. The bad VMware design is becoming very annoying!

Regards - Glen

Regards - Glen
0 Kudos