Hi Vmware community,
I'm having a huge problem with a vm hosted on a 5.1 Esxi free.
I’ve deleted 2 old snapshots then created a new one before doing a huge upgrade of the CRM installed on that VM.
But after doing the (powered off) new snapshot, the VM didn’t start anymore !
I’m having a white non-blinking underscore in the top left corner.
And OF COURSE, I have no up to date backup !!!
Watching at the events, I saw I’m having an issue with the deleted snapshots. The files are still there (SRV-SAG-00001.vmdk, SRV-SAG-00002.vmdk …), but not anymore in the snapshot UI !
And now I can see the Esx complaining about a consolidation needed form that VM.
So I tried to do that consolidation, with the UI and in command line, but it failed each time at the end.
I tried to add/delete a new snapshot (UI/cmd), but it failed.
I’ve installed a Vcenter appliance to manage the esxi host and tried to clone the VM, but it also failed again.
I’ve realized that the *.vmsd file was nearly empty, so I recreated it, without more success.
My situation is really bad now, I need some help.
Anyone?
Hi Remysweb,
To fix any snapshot issue you need to keep three things in mind.
1. Snapshot files are not locked or used by any other program.
2. You should have a valid chain of the snapshot.
3. There should not be any invalid snapshot or snapshot database file.
I hope you could have shared the error, but below are steps you can try.
Identify locks:
# if you find locks check if your backup proxy server is holding those file, check each disk mounted on backup server if it matches this server, disconnect that drive from proxy server.
Identify the chain is correct by running below command:
vmkfstools -e SRV-SAG-00005.vmdk
Invalid snapshots:
Snapshot-4 and snapshot-5 look fishy as they have very less size, (above command will help you if they are invalid)
#if you find any snapshot invalid you need to move those out of that folder and then edit the VM and point to last healthy snapshot.
Try deleting .vmsd file and try consolidating. (Deleting .vmsd will not delete any data, it only remove the stages of the snapshot)
Let me know if you still get any errors.
Hi AjayChananaVMware and thanks for answering so fast, I really appreciate that.
I've checked your points:
Do you have any other precious idea?
Hi remysweb,
Alternatively, you can try to clone the VM or use vSphere standalone converter to consolidate this VM.
Though it shows it pointing to vmdk5 only, one weird situation (possibility only: these snapshots are not associated with VM - this can be identified by doing storage vMotion to different datastore,all the files which are not associated with VM will leave behind in old storage.)
Please check release notes before downloading vSphere standalone converter - it depends on VM guest OS version.
Download link below - you may change the version by drop down.
Download Center for VMware Products
Hiremysweb,
Sorry, I missed the screenshot which you have shared.
This looks storage or file corruption issue, as we are getting system input/output error.
Try to copy the files to VM files to another folder, check which all you can save. (use command shell only not GUI as GUI will not show all the files)
I would recommend avoiding using the same datastore if you finding a similar issue with other VM's. And move the files to newly created datastore.
"Watching at the events, I saw I’m having an issue with the deleted snapshots. The files are still there (SRV-SAG-00001.vmdk, SRV-SAG-00002.vmdk …), but not anymore in the snapshot UI !"
Hi,
I've learned this lesson the hard way. You need to understand the concept of the snapshot. Just because you delete a snapshot does not mean the files are deleted. Only when you run Consolidation they will be merged into the original image again the snapshot files will be deleted. If you do not Consolidate, you will eventually run out of space, depending on the traffic on the VM servers you have on the VM Host. As a result - you won't be able to run backups (they create their own snapshots), your server will be shutdown, you won't be able to create new snapshots, etc.
If you run out of space, you won't be able to run Consolidate, you will have to add storage first or move/clone/copy your VM to another VM host where you have enough space to run Consolidate.
Then you can move your VM back to your originating VM Host.
In case you have another DATASTORE on the same VM host, move it there. To move to another VM host, only use VM tools, if you have the VM host in vCentre.
I hope this explains what a snapshot is. The biggest lesson is - do not keep snapshots forever, once you have completed your tasks delete and run Consolidate.
Do not keep snapshots "Forever", they are not backups.
Richard
Hi remysweb
You don't require to have VM power on to use a converter. You can use it when VM is powered off too.
Regarding the VM power on, and state which you can save it depends on what all files you are able to copy?
If you are able to copy all five .vmdk it should be good then.
Example:-
If you are able to copy only till .vmdk3 then you need to edit the VM and add hdd to vmdk3(snapshot). Make ensure you remove the disk which is pointing to vmdk5.
Also, make ensure to keep the move unnecessary files out of the VM folder.
Note: Each .vmdk is comprised of two files .vmdk (descriptor file) and vm-flat.vmdk (datafile) both required.
Even though you able to save base .vmdk you still can point to base and power on the VM. (Caution: but once you power on the base .vmdk, you cannot point to snapshot later)
This is our last attempt to see what all data we can save, based on what all .vmdk files you are able to copy successfully.
It is good practice if you are doing above testing with the copied data.
> I’m having a white non-blinking underscore in the top left corner.
Did you delete the 2 old snapshots with the snapshot manager or with the datastore browser ?
You deleted the delta.vmdk which has ntldr / bootmanager ... thats why you get the underscore in the top left corner.
For further troubleshooting install WinSCP and download all vmdk descriptorfiles and all vmware.logs and the vmx-file and attach them to your next reply.
You cant do that with datastorebrowser so installation of WinSCP is non debatable.
We also need to see a file-list that includes the time-stamps.
Ulli
Hi RyszardM,
I used to know snapshot aren't a backup solution, and not made to be kept for ever. But i should also have put a reminder somewhere to delete them...
On the other side, I was quite sure the snapshots files disapear meanwhile you delete the snapshot item in the UI.
I've just done the test in my lab, and the file has disapeared, merging wiht the previous snapshot!
So sometimes it merges automatically, sometimes not ... aouch!
Anyway, I will always triple check in the future...
The VM is 140GO and I have about 700GO Free, so that should have been enough to run the consolidation process.
I'm downloading the converter to test the convert process right now. I come back to you as soon as I have results.
Note: Each .vmdk is comprised of two files .vmdk (descriptor file) and vm-flat.vmdk (datafile) both required.
I only have one one flat file. If I understand you well, I should have 6 flat files? (see capture).
I've checked in my lab environement, and I saw the same file structre: only one flat file for the main *.vmdk, not for every *.vmdk snapshotfile.
Hi continuum,
Did you delete the 2 old snapshots with the snapshot manager or with the datastore browser ?
Hopefully, with the snapshotmanager.
Here is the capture of the file list :
Thanks for helping.
Looks like you are using the basedisk without the associated snapshots at the moment.
The longer you do that the lower the chance to recover the data inside the snapshots.
In other words - stop what you are doing now and provide:
- the descriptor vmdks
- all vmware.logs
- the vmx-file
Hello continuum and thank you for your advice,
I'm not doing all that tests on the production esxi, but in my lab environement, with a copy af the original VM.
Reading the previous messages, I realised I needed a copy done with winSCP and not with the vmware datastore browser. So I've done this copy and brought it to my lab.
It's currently uploading to the lab datastore (around 10 hours long!), but I can provide you the requested files.
Thank you very much again for your help.
Your advice to use touch to identify locks is a bad idea - please dont suggest that again in similar cases.
It spoils the timestamps which are necessary for indepth troubleshooting.
remysweb
You must have run into 2 different issues - but you only mentioned one of them.
One problem was that the VM did not boot up completely but stopped with a black screen.
The other problem was when you received the "Parent has been changed since the child was created"
What EXACTLY did you do to fix that problem ?
Anyway I think that your best option is to try to launch the VM with
scsi0:0.fileName = "SRV-SAG-000002.vmdk"
in the vmx-file.
Next time you post a similar problem - wait for a reply from André Pett or me.
Ulli
Mesage edited by a.p.: Fixed typo in my name 😉
Hello remysweb,
Thanks for sharing the logs, I did find the issue is due to same/mismatch CID and PID in few of the vmdk that need to manually edit and fix.
Reference kb - 1007969https://kb.vmware.com/s/article/1007969
I will update you shortly.
Hi remysweb,
The issue occurred due to a break in parent-child sequence which is mentioned in .vmdk files.
Sequence how snapshot is pointing.
7-6-5-4-2-3-1-base.
Explained - So here it's like this
Disk 7 - parent is disk 6, so CID of disk6 should match parent id mentioned in disk7 and so on for other disks.
Steps to perform:
1) unmount all the disk from Virtual Machine. This will help to refresh configuration once you mount it back.
2) you would require to edit .vmdk files and change CID and parentCID as given below.
or you can do by yourself as explained in kb - 1007969
3) Mount all the disk back to VM, initiate consolidation.
Changes required marked in red in respective .vmdk.
SRV-SAG-000007.vmdk
CID=39b1c7d7
parentCID=39b1c7d6
parentFileNameHint="SRV-SAG-000006.vmdk"
SRV-SAG-000006.vmdk
CID=39b1c7d6
parentCID=39b1c7d8
parentFileNameHint="SRV-SAG-000005.vmdk"
SRV-SAG-000005.vmdk
CID=39b1c7d8
parentCID=39b1c7d69
parentFileNameHint="SRV-SAG-000004.vmdk"
SRV-SAG-000004.vmdk
CID=39b1c7d69
parentCID=39b1c710
parentFileNameHint="SRV-SAG-000002.vmdk"
SRV-SAG-000002.vmdk
CID=39b1c710
parentCID=10305b03
isNativeSnapshot="no"
createType="vmfsSparse"
parentFileNameHint="SRV-SAG-000003.vmdk"
SRV-SAG-000003.vmdk
CID=10305b03
parentCID=bba5762d
isNativeSnapshot="no"
createType="vmfsSparse"
parentFileNameHint="SRV-SAG-000001.vmdk"
SRV-SAG-000001.vmdk
CID=bba5762d
parentCID=066c550b
isNativeSnapshot="no"
createType="vmfsSparse"
parentFileNameHint="SRV-SAG.vmdk"
SRV-SAG.vmdk
CID=066c550b
parentCID=ffffffff
# Extent description
RW 167772160 VMFS "SRV-SAG-flat.vmdk"
continuum,
You're totally right,
I had 2 issues.
The first one with the non blinking underscore, then the 2nd one, a parent-child mismatch appeared trying to find a solution to the first issue
I have modified the "SRV-SAG-000003.vmdk" ParentCID from 074ad7aa => bba5762d, in order to match with the parentFileNameHint="SRV-SAG-000001.vmdk".
That cancelled the parent-child mismatch, so I thought I was done with that point.
In my lab, I've just tried to boot with scsi0:0.fileName = "SRV-SAG-000002.vmdk", but still the blank underscore.
There indeed was a CID-missmatch once - but it has been fixed already - so your suggestion makes no sense at all.
In the current state of the vmdks there is no CID-missmatch !!!