VMware Cloud Community
staverts
Contributor
Contributor
Jump to solution

Orphaned Snapshots after Consolidation?

My peers left one of our servers with a snapshot on it for OVER 6 months, until the Hypervisor ran out of space.  I, "Believe" at this point, the delata file, once it ran out of space, was coruupt, as it could not keep adding data to the delta file.  Being limited space, I quickly SHUTDOWN the vm, and tried the trick, of making a SMALL snapshot, with the machine powered down, and then went to snapshit manager, and chose, "DELETE ALL"...of course the space on the data store WAS NOT 1.5 times more than the snapshots, so removing it failed.  I thought at this point, it would be best to migrate the VM to a datastore that is large enough to perform the consolidation.  I mounted an NFS datastore to the hypervisor, removed the current VM from incentory, and performed a COPY of the VM to the new datastore, which was successful.  I then tried to boot the VM, and first it failed, because of course the snapshot the Virtual Hard disk was point at was corrupt.  At this point I analyzed the data, I looked through the VMDK's to determne what was the last good Snapshot VMDK.  I found it, and edited ther .VMX files to point at it, and sure enough, the VM Booted.  A good result.  I then powered it down, made a small snapshot again, and then chose DELETE ALL.  This time it went through successfully.  My VM is not pointed at the BASE vmdk-flat file.  BUT, I seem to have ORPHANED vmdk delta files, what is the best way to deal with this?  If you need logs please let me know.

My goal is to clean up any uneeded orpahned files, and then copy the VM back to the original datastore.

PS- My only real concern at this point, is that there seems to be some delta files hanging around, although I DO NOT believe they are being used, since I am pointed at my base vmdk, after consolidation via snapshit manager.

Looking forward to everyone's advice.  Thanks in advance.

0 Kudos
1 Solution

Accepted Solutions
a_p_
Leadership
Leadership
Jump to solution

I'm still confused!? This is the same VM we discussed at Re: Orphaned Snapshots after Consolidation?.

Since the snapshot chain (1, 3, 4, 2 base) is equal to "PowerSchoolTestWeb.vmdk" you can use the same coomand as mentioned in Re: Orphaned Snapshots after Consolidation? - with just replacing the .vmdk name - to clone the virtual disk.

André

View solution in original post

0 Kudos
21 Replies
a_p_
Leadership
Leadership
Jump to solution

With what you did, you may have lost the data in the snapshot, i.e. all changes of the last 6 months!

Is the original dtastore with its files still in place?

Which version/build of ESXi do you currently use?

Please post a screen shot of the datastore browser window showing all the VM's files with all details (sizes, time stamps, ...) in the original folder.

André

0 Kudos
staverts
Contributor
Contributor
Jump to solution

okay..well thats good to know....I do indeed have ALL the orginal files still, this is why I made a copy, incase I royally FU, like it sounds I did. Anyways, here is a screenshot of the original datastore and all files in it, untouched:

0 Kudos
staverts
Contributor
Contributor
Jump to solution

Sorry one last note, that may be causing you concern Smiley Happy  This statement, "My VM is not pointed at the BASE vmdk-flat file" should read, "My VM is NOW pointed at the BASE vmdk-flat file.  It is INDEED pointed at the base vmdb Flatfile after consolidation.

0 Kudos
staverts
Contributor
Contributor
Jump to solution

In addition to the VM Screenshotm please find listed my version of the Hypervisor:

esxi   4.0.0 Build 208167

Andre, if you look, at that screen shot, you will see all the snapshits AFTER PowerschoolTestWeb-000004.vmdk, are the small ones I made, with the VM powerd OFF, trying to Consolidate with DELETE ALL, and realizing there us just not enough space to do it.  Hense, once I copied the while virtual machine to a new datastore, I then proceded to find out which vmdb would boot...and that was PowerSchoolTestWeb-000004.vmdk, since it was the last known good snapshot that I believe suceeded, but maybe I'm out to lunch on that...It was the last one with data that would boot, that seem to have significated size that was NOT just the standard 8MB.  It seemed logical at the time...but what do I know Smiley Happy

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

There are two reasons which may have caused the datastore to run out of disk space. Firstly the VM is thin provisioned and secondly you are using ESXi 4.0 Update 1. With ESXi 4.0 Update 2 VMware changed the way multiple snapshot are merged into the base disk, which does not require additional disk space if the VM is powered off (for thick provisioned virtual disks only).

Before we go into details though, please explain what exactly you did after copying the files to the NAS (i.e. which snapshot you used for consolidation) and which files are still in the NAS datastore. Do you miss any current files in the copied VM?

André

PS: I just saw the additions to your latest post. To see whether the snapshot you used was the correct one, please compress/zip the original VM's vmware*.log files and attach them to your next post. These log files contain the proper snapshot chain. The Snapshot numbers in the file names do not necessarily have to be in an ascending order.

0 Kudos
staverts
Contributor
Contributor
Jump to solution

Andre, Please find attached the powerschool VMware log files, for your review.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

According to the log file the snapshot chain - at the time you ran out of disk space - was:

...000001.vmdk -> 000003.vmdk -> 000004.vmdk -> 000002.vmdk -> PowerSchoolTestWeb.vmdk

where the ...000001.vmdk snapshot was created as the "consolidate helper" snapshot. You were close with using the ...000004.vmdk as the VM's disk, but actually missed the deltas in ...000003.vmdk and ...000001.vmdk.

What you may do is to clone the VM's virtual disk (which merges all snapshots in the chain to the target disk) to a new folder (you need to create this folder prior to running the command) with sufficient free disk space, running the following command (without line breaks):

vmkfstools -i /vmfs/volumes/<current-datastore>/PowerSchoolTestWeb/PowerSchoolTestWeb-000001.vmdk /vmfs/volumes/<target-datastore>/PowerSchoolTestWeb/PowerSchoolTestWeb.vmdk -d thin

If there's an issue with ...000001.vmdk, use ...000003.vmdk as the source.

André

0 Kudos
staverts
Contributor
Contributor
Jump to solution

Hi Andre, once I copied the VM  to a new datastore IE: PowerschoolTestWeb VM, what vmdk should have I used?  And in the future, how do I check that chain myself Smiley Happy  And one last thing, here are the logs for the next and Last vm I have to do this with (IE PowerschoolTestDataBase) what is the snapshot or disk from the chain I should be using with this one before consolidation Smiley Happy, Is it safe to UPGRADE the hypervisor, by booting from the boot disk (IE esxi 4.0 update 2),  and choosing upgrade or something to that effect...I imagine it tries to keep the datasore intact if ones does this..Thanks in advance!!

Please find attached a screen shot of the powerschooltestdatabase, and it's log files

PS - It makes sense that you say I miseed 3 and 1, because I noticed after consolidation, that 3 and 1 are left with MUCH data in them, not consolicated!

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

To determine the proper snapshot chain check the current .vmdk entry in the .vmx file and then find the most current DISKLIB... entries in the latest vmware.log file. In the below example, "PowerSchoolTestDatabase-000001.vmdk" should match the current entry in the VM's .vmx file and the snapshot chain is 000001, 000003, 000004, 000002, base.vmdk. In this case start restoring/cloning the VM with 000001.vmdk. If this file is corrupt, continue with the next one (000003.vmdk), ... This way you will loose as less data (if any) as possible.

The reason I mentioned the vmkfstools command to clone the virtual disk, is to save disk space and complete the disk consolidation a lot faster with the ESXi build you are using.

André

Oct 24 12:55:20.652: vmx| DISK: OPEN scsi0:0 '/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000001.vmdk' persistent R[]
Oct 24 12:55:20.653: vmx| DISKLIB-VMFS  : "/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000001-delta.vmdk" : open successful (10) size = 17188864, hd = 254327622. Type 8
Oct 24 12:55:20.653: vmx| DISKLIB-DSCPTR: Opened [0]: "PowerSchoolTestDatabase-000001-delta.vmdk" (0xa)
Oct 24 12:55:20.653: vmx| DISKLIB-LINK  : Opened '/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000001.vmdk' (0xa): vmfsSparse, 419430400 sectors / 200 GB.
Oct 24 12:55:20.667: vmx| DISKLIB-VMFS  : "/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000003-delta.vmdk" : open successful (14) size = 190304372736, hd = 247790407. Type 8
Oct 24 12:55:20.667: vmx| DISKLIB-DSCPTR: Opened [0]: "PowerSchoolTestDatabase-000003-delta.vmdk" (0xe)
Oct 24 12:55:20.667: vmx| DISKLIB-LINK  : Opened '/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000003.vmdk' (0xe): vmfsSparse, 419430400 sectors / 200 GB.
Oct 24 12:55:20.678: vmx| DISKLIB-VMFS  : "/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000004-delta.vmdk" : open successful (14) size = 78350010368, hd = 247544648. Type 8
Oct 24 12:55:20.678: vmx| DISKLIB-DSCPTR: Opened [0]: "PowerSchoolTestDatabase-000004-delta.vmdk" (0xe)
Oct 24 12:55:20.678: vmx| DISKLIB-LINK  : Opened '/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000004.vmdk' (0xe): vmfsSparse, 419430400 sectors / 200 GB.
Oct 24 12:55:20.698: vmx| DISKLIB-VMFS  : "/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000002-delta.vmdk" : open successful (14) size = 7650822144, hd = 386759497. Type 8
Oct 24 12:55:20.698: vmx| DISKLIB-DSCPTR: Opened [0]: "PowerSchoolTestDatabase-000002-delta.vmdk" (0xe)
Oct 24 12:55:20.698: vmx| DISKLIB-LINK  : Opened '/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000002.vmdk' (0xe): vmfsSparse, 419430400 sectors / 200 GB.
Oct 24 12:55:20.711: vmx| DISKLIB-VMFS  : "/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-flat.vmdk" : open successful (14) size = 214748364800, hd = 378878794. Type 3
Oct 24 12:55:20.711: vmx| DISKLIB-DSCPTR: Opened [0]: "PowerSchoolTestDatabase-flat.vmdk" (0xe)
Oct 24 12:55:20.712: vmx| DISKLIB-LINK  : Opened '/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase.vmdk' (0xe): vmfs, 419430400 sectors / 200 GB.
Oct 24 12:55:20.712: vmx| DISKLIB-CHAINESX : ChainESXOpenSubChain: numLinks = 5, numSubChains = 1
Oct 24 12:56:58.146: vmx| DISKLIB-LIB   : Opened "/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000001.vmdk" (flags 0xa).
Oct 24 12:56:58.146: vmx| DISK: OPEN '/vmfs/volumes/4cf364a4-1416b0c7-37dd-60eb696ef005/PowerSchoolTestDatabase/PowerSchoolTestDatabase-000001.vmdk' Geo (26108/255/63) BIOS Geo (0/0/0)
Oct 24 12:56:58.147: vmx| Creating virtual dev for scsi0:0
Oct 24 12:56:58.147: vmx| DumpDiskInfo: scsi0:0 createType=11, capacity = 419430400, numLinks = 5, allocationType = 0
0 Kudos
staverts
Contributor
Contributor
Jump to solution

Okay, that makes sense, I will review the log files myself as well, to make sure I understand, and I am in line with what you are saying...So I should have pointed 000001, and then performed the consolidation..makes sense....the good thing is, I have all the data still...Just copying takes time.

Does the process still work the same in the from snapshit manager, first, after checking the chain, I would force the VM to look at blabla-000001.vmdk (OR whatever the last one was before running out of space), then I would make sure the VM booted. AFter this I would shut it down, and create a quick offline snapshot...then go to snapshot manager and say, "Delete All"...and hopefully all is well??  Is that a reasonable process?

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Yes, provided you have enough free disk space you can delete the snapshots from the snapshot manager. Keep in mind that after manually editing the .vmx file you need to reload the VM (e.g. remove from inventory and add to inventory) to force ESXi to re-read the .vmx file.

For the current VM you may temporarily need a few hundred GB for the "Delete All" process because with ESXi 4.0 Update 1 the snapshots are merged to their parent starting from the latest one. This is: 000001 will be merged into 000003, then 000003 will be merged into 000004, 000004 into 000002 and finally 000002 into the base.vmdk. With this procedure all the delta files/snapshots will grow and will also not be deleted until the last snapshot has been merged into the base disk.

André

0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

André Pett wrote:

For the current VM you may temporarily need a few hundred GB for the "Delete All" process because with ESXi 4.0 Update 1 the snapshots are merged to their parent starting from the latest one.

André

I note this issue constantly raised on this forum.

For people impacted, is it a better solution to upgrade their hypervisor and THEN run a delete all?

Or is it too late now that the snapshots have become problematic?

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

It depends on the specific situation. Basically it's possible to update/patch ESXi to a newer build to benefit from the modified "Delete All" functionality. However, if disk space is already too low and the base disk is thin provisioned it does not help, because the base disk will most likely grow with merging the deltas into it.

André

0 Kudos
staverts
Contributor
Contributor
Jump to solution

Thanks Josh and Andre for your thoughts and input, Josh I was thinking the same thing, but due to the stage its at now, I think my best course of action, is to copy the machines to a large SAN (I have a 8TB one), study the snapshit chains as Andre suggested, Choose the last nest snapshot, before I ran out of space, and then Run the consolodate...If all goes well, I can copy them back to the original datastore after.

PS - Andre, One all the snapshits that are in the chain are consolidated...what do I do with the left over or orphaned ones I created when the machine was off, to try and run Delete all (The small snapshiots that were created, to try and run the DELETE ALL from snapshot manager) can they just be deleted at that point....I know one thing...I will ALWAYS, ALWAYS, ALWAYS from now on make sure that my staff DELETE THEIR SNAPSHOTS immediatly...and NOT USE them as a backup.  I was using vcbghetto, and this seems like a way better solution for backups....on the afforadable side that is...I think I should look at vranger, or the acronis solution for VM backups...anyways...crazyness, can't wait until I've got this fixed....

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Snapshots which are not in the chain anymore will not be touched by the ESXi host and need to be deleted manually using e.g. the datastore browser. I'd recommend you ensure the base .vmdk file shows up in the VM's configuration after "Delete All" finishes successfully, then create a sub-folder and move all the remaining ...00000x.vmdk files to this sub-folder. If the VM powers on without issues you can safely delete the sub-folder.

Btw, you may want to take a look at RVTools (http://www.robware.net/) which gives you a great overview of your environment, including active snapshots.

André

0 Kudos
staverts
Contributor
Contributor
Jump to solution

Andre, thanks for ALL your help to date, using the proper snapshot chain you suggested for my PowerSchooltestWeb, I was able fully consolidate this machine, and it is running PERFECTLY!!!  I am still not that confident determing the snapshot chain in the log files, and I have one more server to consolidate this way, my powerschooltestdatase.  I have attached the this vm's log files, do you mind confirming the snapshot chain order for me again.  Thank you SOOOOO much in advance.  You've been a real Cult like savior to me ha ha ha.

Please see attached log files.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Didn't we discuss this VM already?

If the previous post about this VM didn't help, please post (attach) the VM's .vmx file as well as all the VM's .vmdk header files (the small ones) to ensure the proper chain.

André

0 Kudos
staverts
Contributor
Contributor
Jump to solution

Yes we did, and my problem is fixed on the first one now, it is consoilidated and running...however, I have a second one at the same time, that suffered the same problem, I can fix the problem no issue there, as stated though, the part where I am not confident, is the part where I determine the snapshot chain, this is a second vm now...we determined the snapshot chain in the first one, and it is indeed fixed now...but I need a little help determining the chain in the second VM, before the second one ran out of space.  I am confident in every category with this now, except determing the snapshot chain from the log files...hope that makes sense, sorry if I am making you impatient...But INDEED you have helped me so much, I was able to easily fix the first VM based on knowing what the proper chain was before lack of apce occured, now I just need to do the same for the second. The log files listed above is for the second VM, what might have been a little confusing is the names are simular, IE:

PowerschoolTestWeb

PowerschoolTestDatabase

Thank you.

UPDATE  I have reattached the logs with the VMX file sorry for the delay.

UPDATE2:  I have now added a picture of the directory Structure as well.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

I'm still confused!? This is the same VM we discussed at Re: Orphaned Snapshots after Consolidation?.

Since the snapshot chain (1, 3, 4, 2 base) is equal to "PowerSchoolTestWeb.vmdk" you can use the same coomand as mentioned in Re: Orphaned Snapshots after Consolidation? - with just replacing the .vmdk name - to clone the virtual disk.

André

0 Kudos