VMware Cloud Community
remysweb
Contributor
Contributor

Consolidation failed on a VM hosted on esxi 5.1 free

Hi Vmware community,

I'm having a huge problem with a vm hosted on a 5.1 Esxi free.

I’ve deleted 2 old snapshots then created a new one before doing a huge upgrade of the CRM installed on that VM.

But after doing the (powered off) new snapshot, the VM didn’t start anymore !

I’m having a white non-blinking underscore in the top left corner.

And OF COURSE, I have no up to date backup !!!

Watching at the events, I saw I’m having an issue with the deleted snapshots. The files are still there (SRV-SAG-00001.vmdk, SRV-SAG-00002.vmdk …), but not anymore in the snapshot UI !

And now I can see the Esx complaining about a consolidation needed form that VM.

So I tried to do that consolidation, with the UI and in command line, but it failed each time at the end.

I tried to add/delete a new snapshot (UI/cmd), but it failed.

I’ve installed a Vcenter appliance to manage the esxi host and tried to clone the VM, but it also failed again.

I’ve realized that the *.vmsd file was nearly empty, so I recreated it, without more success.

My situation is really bad now, I need some help.

Anyone?

pastedImage_0.png

pastedImage_1.png

35 Replies
AjayChananaVMwa
VMware Employee
VMware Employee

Hi Remysweb,

To fix any snapshot issue you need to keep three things in mind.

1. Snapshot files are not locked or used by any other program.

2. You should have a valid chain of the snapshot.

3. There should not be any invalid snapshot or snapshot database file.

I hope you could have shared the error, but below are steps you can try.

Identify locks:

  • login to host were VM is residing.
  • Go to VM directory
  • type touch *        (This will tell you is you have any locks on files)

# if you find locks check if your backup proxy server is holding those file, check each disk mounted on backup server if it matches this server, disconnect that drive from proxy server.

Identify the chain is correct by running below command:

vmkfstools -e SRV-SAG-00005.vmdk

Invalid snapshots:

Snapshot-4 and snapshot-5 look fishy as they have very less size, (above command will help you if they are invalid)

#if you find any snapshot invalid you need to move those out of that folder and then edit the VM and point to last healthy snapshot.

Try deleting .vmsd file and try consolidating. (Deleting .vmsd will not delete any data, it only remove the stages of the snapshot)

Let me know if you still get any errors.

Sincerely,
Ajay Chanana
Skyline Support Moderator
MCSE-2003/2008|RHCA|VCP-5/6/VCAP-6
remysweb
Contributor
Contributor

Hi AjayChananaVMware​ and thanks for answering so fast, I really appreciate that.

I've checked your points:

  • The "touch *" command: Doesn't give any result. I assume it's because nothing is locked. By the way, i'm not using any backup server. (I didn't find any working with the esxi free version).
  • The  "vmkfstools -e SRV-SAG-00005.vmdk" command : Just launched on every *.mvdk file. Each time the answer is "Disk chain is consistant".
  • The SRV-SAG-00004.vmdk and SRV-SAG-00005.vmdk snapshots: Trying to solve the original issue, I've created/deleted these 2 snapshots with the UI and ssh (VM powered OFF). But it didn't delete all the files as expected. On the contrary, i have now 2 more (valid) backups...
  • I've just tried to delete the *.vmsd file the consolidate, but unfortunatly it failed...
  • pastedImage_2.png

Do you have any other precious idea?

0 Kudos
AjayChananaVMwa
VMware Employee
VMware Employee

Hi remysweb​,

Alternatively, you can try to clone the VM or use vSphere standalone converter to consolidate this VM.

Though it shows it pointing to vmdk5 only, one weird situation (possibility only: these snapshots are not associated with VM - this can be identified by doing storage vMotion to different datastore,all the files which are not associated with VM will leave behind in old storage.)

Please check release notes before downloading vSphere standalone converter - it depends on VM guest OS version.

Download link below - you may change the version by drop down.

Download Center for VMware Products

Sincerely,
Ajay Chanana
Skyline Support Moderator
MCSE-2003/2008|RHCA|VCP-5/6/VCAP-6
0 Kudos
AjayChananaVMwa
VMware Employee
VMware Employee

Hiremysweb​,

Sorry, I missed the screenshot which you have shared.

This looks storage or file corruption issue, as we are getting system input/output error.

Try to copy the files to VM files to another folder, check which all you can save. (use command shell only not GUI as GUI will not show all the files)

I would recommend avoiding using the same datastore if you finding a similar issue with other VM's. And move the files to newly created datastore.

Sincerely,
Ajay Chanana
Skyline Support Moderator
MCSE-2003/2008|RHCA|VCP-5/6/VCAP-6
0 Kudos
remysweb
Contributor
Contributor

  • Regarding to the vmware converter, I would have loved to use it, but as my VM doesn't start anymore, I don't see how to do it.
  • For the storage corruption: I thought about that too. So I've backed up the VM to my own esxi in a lab environement, with exactly the same error.
0 Kudos
RyszardM
Contributor
Contributor

"Watching at the events, I saw I’m having an issue with the deleted snapshots. The files are still there (SRV-SAG-00001.vmdk, SRV-SAG-00002.vmdk …), but not anymore in the snapshot UI !"

Hi,

I've learned this lesson the hard way. You need to understand the concept of the snapshot. Just because you delete a snapshot does not mean the files are deleted. Only when you run Consolidation they will be merged into the original image again the snapshot files will be deleted. If you do not Consolidate, you will eventually run out of space, depending on the traffic on the VM servers you have on the VM Host. As a result - you won't be able to run backups (they create their own snapshots), your server will be shutdown, you won't be able to create new snapshots, etc.

If you run out of space, you won't be able to run Consolidate, you will have to add storage first or move/clone/copy  your VM to another VM host where you have enough space to run Consolidate.

Then you can move your VM back to your originating VM Host.

In case you have another DATASTORE on the same VM host, move it there. To move to another VM host, only use VM tools, if you have the VM host in vCentre.

I hope this explains what a snapshot is. The biggest lesson is - do not keep snapshots forever, once you have completed your tasks delete and run Consolidate.

Do not keep snapshots "Forever", they are not backups.

Richard

AjayChananaVMwa
VMware Employee
VMware Employee

Hi remysweb

You don't require to have VM power on to use a converter. You can use it when VM is powered off too.

Regarding the VM power on, and state which you can save it depends on what all files you are able to copy?

If you are able to copy all five .vmdk it should be good then.

Example:-

If you are able to copy only till .vmdk3 then you need to edit the VM and add hdd to vmdk3(snapshot). Make ensure you remove the disk which is pointing to vmdk5.

Also, make ensure to keep the move unnecessary files out of the VM folder.

Note: Each .vmdk is comprised of two files .vmdk (descriptor file) and vm-flat.vmdk (datafile) both required.

Even though you able to save base .vmdk  you still can point to base and power on the VM. (Caution: but once you power on the base .vmdk,  you cannot point to snapshot later)

This is our last attempt to see what all data we can save, based on what all .vmdk files you are able to copy successfully.

It is good practice if you are doing above testing with the copied data.

Sincerely,
Ajay Chanana
Skyline Support Moderator
MCSE-2003/2008|RHCA|VCP-5/6/VCAP-6
0 Kudos
continuum
Immortal
Immortal

> I’m having a white non-blinking underscore in the top left corner.

Did you delete the 2 old snapshots with the snapshot manager or with the datastore browser ?

You deleted the delta.vmdk which has ntldr / bootmanager ... thats why you get the underscore in the top left corner.

For further troubleshooting install WinSCP and download all vmdk descriptorfiles and all vmware.logs and the vmx-file and attach them to your next reply.

You cant do that with datastorebrowser so installation of WinSCP is non debatable.

We also need to see a file-list that includes the time-stamps.

Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
remysweb
Contributor
Contributor

Hi RyszardM​,

I used to know snapshot aren't a backup solution, and not made to be kept for ever. But i should also have put a reminder somewhere to delete them...

On the other side, I was quite sure the snapshots files disapear meanwhile you delete the snapshot item in the UI.

I've just done the test in my lab, and the file has disapeared, merging wiht the previous snapshot!

So sometimes it merges automatically, sometimes not ... aouch!

Anyway, I will always triple check in the future...

The VM is 140GO and I have about 700GO Free, so that should have been enough to run the consolidation process.

0 Kudos
remysweb
Contributor
Contributor

I'm downloading the converter to test the convert process right now. I come back to you as soon as I have results.

Note: Each .vmdk is comprised of two files .vmdk (descriptor file) and vm-flat.vmdk (datafile) both required.

I only have one one flat file. If I understand you well, I should have 6 flat files? (see capture).

I've checked in my lab environement, and I saw the same file structre: only one flat file for the main *.vmdk, not for every *.vmdk snapshotfile.

pastedImage_1.png

0 Kudos
remysweb
Contributor
Contributor

Hi continuum​,

Did you delete the 2 old snapshots with the snapshot manager or with the datastore browser ?

Hopefully, with the snapshotmanager.

Here is the capture of the file list :

pastedImage_2.png

Thanks for helping.

0 Kudos
continuum
Immortal
Immortal

Looks like you are using the basedisk without the associated snapshots at the moment.

The longer you do that the lower the chance to recover the data inside the snapshots.

In other words - stop what you are doing now and provide:

- the descriptor vmdks

- all vmware.logs

- the vmx-file


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
remysweb
Contributor
Contributor

Hello continuum​ and thank you for your advice,

I'm not doing all that tests on the production esxi, but in my lab environement, with a copy af the original VM.

Reading the previous messages, I realised I needed a copy done with winSCP and not with the vmware datastore browser. So I've done this copy and brought it to my lab.

It's currently uploading to the lab datastore (around 10 hours long!), but I can provide you the requested files.

Thank you very much again for your help.

0 Kudos
continuum
Immortal
Immortal

AjayChanana

Your advice to use touch to identify locks is a bad idea - please dont suggest that again in similar cases.
It spoils the timestamps which are necessary for indepth troubleshooting.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
continuum
Immortal
Immortal

remysweb

You must have run into 2 different issues - but you only mentioned one of them.

One problem was that the VM did not boot up completely but stopped with a black screen.

The other problem was when you received the "Parent has been changed since the child was created"

What EXACTLY did you do to fix that problem ?

Anyway I think that your best option is to try to launch the VM with

scsi0:0.fileName = "SRV-SAG-000002.vmdk"

in the vmx-file.

Next time you post a similar problem - wait for a reply from André Pett or me.

Ulli

Mesage edited by a.p.: Fixed typo in my name 😉


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
AjayChananaVMwa
VMware Employee
VMware Employee

Hello remysweb​,

Thanks for sharing the logs, I did find the issue is due to same/mismatch CID and PID in few of the vmdk that need to manually edit and fix.

Reference kb - 1007969https://kb.vmware.com/s/article/1007969

VMware Knowledge Base

I will update you shortly.

Sincerely,
Ajay Chanana
Skyline Support Moderator
MCSE-2003/2008|RHCA|VCP-5/6/VCAP-6
0 Kudos
AjayChananaVMwa
VMware Employee
VMware Employee

Hi remysweb​,

The issue occurred due to a break in parent-child sequence which is mentioned in .vmdk files.

Sequence how snapshot is pointing.

7-6-5-4-2-3-1-base.

Explained - So here it's like this

Disk 7 - parent is disk 6, so CID of disk6 should match parent id mentioned in disk7 and so on for other disks.

Steps to perform:

1) unmount all the disk from Virtual Machine. This will help to refresh configuration once you mount it back.

2) you would require to edit .vmdk files and change CID and parentCID as given below.

or you can do by yourself as explained in kb - 1007969

3)  Mount all the disk back to VM, initiate consolidation.

Changes required marked in red in respective .vmdk.

SRV-SAG-000007.vmdk

CID=39b1c7d7

parentCID=39b1c7d6

parentFileNameHint="SRV-SAG-000006.vmdk"

SRV-SAG-000006.vmdk

CID=39b1c7d6

parentCID=39b1c7d8

parentFileNameHint="SRV-SAG-000005.vmdk"

SRV-SAG-000005.vmdk

CID=39b1c7d8

parentCID=39b1c7d69

parentFileNameHint="SRV-SAG-000004.vmdk"

SRV-SAG-000004.vmdk

CID=39b1c7d69

parentCID=39b1c710

parentFileNameHint="SRV-SAG-000002.vmdk"

SRV-SAG-000002.vmdk

CID=39b1c710

parentCID=10305b03

isNativeSnapshot="no"

createType="vmfsSparse"

parentFileNameHint="SRV-SAG-000003.vmdk"

SRV-SAG-000003.vmdk

CID=10305b03

parentCID=bba5762d

isNativeSnapshot="no"

createType="vmfsSparse"

parentFileNameHint="SRV-SAG-000001.vmdk"

SRV-SAG-000001.vmdk

CID=bba5762d

parentCID=066c550b

isNativeSnapshot="no"

createType="vmfsSparse"

parentFileNameHint="SRV-SAG.vmdk"

SRV-SAG.vmdk

CID=066c550b

parentCID=ffffffff

# Extent description

RW 167772160 VMFS "SRV-SAG-flat.vmdk"

Sincerely,
Ajay Chanana
Skyline Support Moderator
MCSE-2003/2008|RHCA|VCP-5/6/VCAP-6
0 Kudos
remysweb
Contributor
Contributor

continuum​,

You're totally right,

I had 2 issues.

The first one with the non blinking underscore, then the 2nd one, a parent-child mismatch appeared trying to find a solution to the first issue Smiley Sad

I have modified the "SRV-SAG-000003.vmdk"  ParentCID from 074ad7aa => bba5762d, in order to match with the parentFileNameHint="SRV-SAG-000001.vmdk".

That cancelled the parent-child mismatch, so I thought I was done with that point.

In my lab, I've just tried to boot with scsi0:0.fileName = "SRV-SAG-000002.vmdk", but still the blank underscore.

0 Kudos
continuum
Immortal
Immortal

There indeed was a CID-missmatch once - but it has been fixed already - so your suggestion makes no sense at all.

In the current state of the vmdks there is no CID-missmatch !!!


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos