VMware Cloud Community
statikregimen
Contributor
Contributor
Jump to solution

Accidentally added snapshot vmdk to VM

Greetings - here's my first post....


Two reasons for posting:
1- Find out if I should file a bug report

2- Share the solution that worked for me

I'll try to be as brief as possible in explaining this, but it was one of those white-knuckle situations so I'm probably fuzzy on some details.

Let me first make it perfectly clear that I did NOT create a snapshot at any point during this or at any point in the past - there is ONE snapshot in existence for this VM, not created in a timeframe that would have interfered with this event.

Basically, I needed to give a VM more disk space, but our entire system is low on space so I don't have enough "scratch" storage to be cloning things for short-term backup, plus actually taking a snapshot would have made this even worse (however that does somehow come into play later). While looking at the files associated with the VM, I noticed a 2nd large, apparently unused virtual disk in its folder. I wanted to try to mount that to see what it was and if I could delete it, so I added it to the VM as a secondary drive, and tried to boot. It failed saying "File system specific implementation LookupAndOpen[file] failed". No big deal, right? Obviously something is wrong with that file or whatever, and that's probably why it was unused in the first place. Therefore, I'll just remove it from the VM and it will be like nothing ever happened, right? WRONG. Upon trying to remove it, I got "Cannot remove virtual disk from the virtual machine because it or one of its parent disks is part of a snapshot of the virtual machine."...WTF!? So now I have a non-booting machine because this stupid drive i JUST added wont go back away....I mean, at no point had I taken any snapshots.

[SOLUTION] So now what? I Googled my butt off for a while to no avail. However, lucky me, I remembered having a very similar issue just a couple hours earlier with another VM I was trying to TAKE space from to give to this one (Robin Hood style). I had to move the drive to a different virtual SCSI port, save my changes, then it would let me remove it just fine. It worked again and with that drive removed, I was able to boot normally *WHEW*.

So unless someone can give me a rational explanation as to why the drive I added (again: without having actually taken any snapshots) suddenly threw ANY snapshot-related error, let alone one that would indicate it was part of one, I think this is a major bug. There is only one snapshot of this VM in existence, and that one appears to have been created by VMWare autoupdate a few days prior, and the weird (apparently)abandon virtual drive I tried to load was from a year before that, so no way that automated snapshot would have included any reference to it.

Hopefully that all makes sense. Myself, and my colleague just recently took over the entire IT department from our previous manager, who did not make documentation any sort of priority,  so there are a lot of unknowns. But at least now I can tell the story of that time I nearly lost my company's entire accounting database doing something that seemed like it should have been entirely harmless and routine....Sure beats the story of the time I accidentally typed "rm -rf ." as root from the root partition on my personal machine...,

Cheers,

MK

p.s. this is the most confusing forum software I've ever used. Wtf is a "place" and why do I need to add one now (edit: i did obviously sort of figure it out but I'm still confused - no clue if I posted in the right "place" or whatever)? Normally I would have selected a category to post in first. We're mostly in IT here - very busy, overworked people - the last thing we want to do at the end of a long day is figure out convoluted forum software just for some quick, low level support....

Reply
0 Kudos
1 Solution

Accepted Solutions
continuum
Immortal
Immortal
Jump to solution

I have seen hundreds of variants of this story.
The following explanation usually applies. All details you mentioned so far confirm this theory.
VM was running for a year with
scsi0:0.fileName = "name-000001.vmdk"
Some day a new admin appears on the scene.
Finding an apparently unused vmdk with significant size he tries to add
scsi0:1.fileName = "name.vmdk"
And then the situation goes south ...
For me this is expected behaviour - no weird bug at all.
Please post more details ....
To avoid this in future I recommend to install WinSCP, connect to an ESXi, and spend about an hour reading some vmdk and vmx-files with the embedded WinSCP-editor.
You will run into similar issues again and you need to be able to check wether a VM uses a snapshot-VMDK or a basedisk-VMDK.
In other words - I am pretty sure you misjudged the scenario.
Anyway - we need more details to decide.

Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

View solution in original post

Reply
0 Kudos
4 Replies
continuum
Immortal
Immortal
Jump to solution

I have seen hundreds of variants of this story.
The following explanation usually applies. All details you mentioned so far confirm this theory.
VM was running for a year with
scsi0:0.fileName = "name-000001.vmdk"
Some day a new admin appears on the scene.
Finding an apparently unused vmdk with significant size he tries to add
scsi0:1.fileName = "name.vmdk"
And then the situation goes south ...
For me this is expected behaviour - no weird bug at all.
Please post more details ....
To avoid this in future I recommend to install WinSCP, connect to an ESXi, and spend about an hour reading some vmdk and vmx-files with the embedded WinSCP-editor.
You will run into similar issues again and you need to be able to check wether a VM uses a snapshot-VMDK or a basedisk-VMDK.
In other words - I am pretty sure you misjudged the scenario.
Anyway - we need more details to decide.

Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
statikregimen
Contributor
Contributor
Jump to solution

Thanks for the reply. Sorry for the weird subject line... I'll fix that next - it should read [PARTIALLY SOLVED] but the forum software automatically put the word Solved. When I took mine out, I didn't realize how awkward it actually was.

Anyway, you hit the nail on the head with exactly what happened....I'm not sure what other details would be valuable, but I'm happy to provide whatever I have. Anyway, I thought the situation was over last night, but I realized later that the somewhat similar issue I had run into earlier that day had actually caused data loss (on a thankfully non-critical VM). But the situations were quite different (for one, it happened when I added a brand new vmdk and tried to remove the old one - it wouldn't let me, saying it was part of a snapshot and while my "fix" got it out of there I later realized somehow the data on the NEW vmdk was data from the snapshot timestamp - not the latest data on the old vmdk nor the data I copied to the new vmdk). I never asked VMWare to do ANYTHING with the NEW vmdk - it should have never ever touched that file.

So I came in early to see if the same thing happened on the 2nd instance, and sure enough: it's all from the timestamp of the snapshot and a year of accounting data may be gone now. I believe we have enough backups on hand that I can piece most of it back together, but if this happens often to other newbies, it seems like they would put a safeguard in place by now to say "Hey wait a minute buddy - this image looks like a snapshot...you sure about this?". And it actually did NOT power on - vSphere web client threw the aforementioned error messages, so how data would have gotten lost is still beyond me.

I assure you I will not run into this issue again (or at least wont cause it), because I'm never attaching a virtual device to a VM again without a complete backup first.  It's easy to get lulled in and take for granted all the warnings and safeguards software often has in place to prevent the user from doing something that may SEEM innocent, but may, in fact, be completely insane. So lesson learned (and maybe job lost if I can't fix this).

Thanks again, and please let me know what additional infos may be of value. I'll be contacting customer support later to ensure all possible hope is exhausted. Right now, I'm taking a backup so I can do more investigation before calling...Perhaps along the way, I'll see some more useful information to post here.

Reply
0 Kudos
continuum
Immortal
Immortal
Jump to solution

For a solid evaluation of your problem I would need a list of all the  files from inside the VMs directory and a copy of all the vmware.logs from the same directory.
It is still possible that the only guy misjudging the case is me.
So if you run into any problem with data loss - attach the logs to your next reply or feel free to call via skype.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
statikregimen
Contributor
Contributor
Jump to solution

Thank you - I will do that hence forth, but until just an hour ago, I didn't even know how to pull logs. I come from a background of KVM/Qemu and Virtualbox as my only modern experience with virutalization (I used VMWare back in the early 2000s but didn't work at this level of IT until this job).

ANYWAY, I am pleased to report that the issue has (somehow magically) resolved itself. I contacted customer support, just to see if there's anything they could do, but to my dismay they said everything looked good, so I powered up the VM again to prove that the data was in fact missing, however it had suddenly returned.

Neither me nor the rep I spoke to has any explanation, so I have to chalk it up to some weird behavior emerging out of the complexity of this system and/or my own error/oversight somewhere. I have no idea. I was admittedly in a bit of a panic, as well as a fog because I came in 2 hours earlier than normal to work on it.

Once my heart stops racing and I calm down a bit, I'll revisit this and will see if I can spot what happened and will post back with my findings, just in hopes it saves someone else's ass some day....Funny a lot of my most terrifying moments in IT have turned out to be nothing at all and many of the most innocent, routine things I do sometimes cause the greatest harm....

EDIT: I changed the subject once more, now that I have learned the proper terminology to describe what happened - and it is now officially solved either way! Cheers.

Reply
0 Kudos