VMware Communities
tomscase
Contributor
Contributor

Windows machines in repair loop after deleting/merging snapshots

Hi all,

On a number of recent occasions, I have deleted snapshots of a Windows server and Windows 10 desktop only to find that when I boot up the VMs they immediately start the Windows repair/recovery process. The deletion/merge process always completes without error.

The most recent one was a Domain Controller which went into repair wizard with an error stating that ntoskrnl.exe file was missing or corrupt.

Windows repair can't fix it either

I've been forced to buy a backup drive to hive off my VM's before I try to do any snapshot cleanup in case it kills my VM Smiley Sad

All my VM's are shutdown when I take the snapshots

Reply
0 Kudos
15 Replies
tomscase
Contributor
Contributor

Is there an official support contact for the type of issue?

Reply
0 Kudos
scott28tt
VMware Employee
VMware Employee

VMware Support Offerings & Services


-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
Reply
0 Kudos
wila
Immortal
Immortal

Smells like a hardware error to me.

A few things to look into:

- run a disk check on your VM before you run the snapshot.

- test your disk (eg. run the snapshot/commit on your VM copied to another disk

- run a test on your RAM at your host.. beware that malfunctioning RAM can cause file corruption even when copying. It is however more likely to show up when you run a process that uses a lot of memory and then hits the bad chips. I've seen this in the past and it was pretty scary.

--

Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos
DanielLBenway
Contributor
Contributor

I'm experiencing the same problem.

Reply
0 Kudos
DanielLBenway
Contributor
Contributor

similar report:

Active Directory corruption after deleting previous snapshot in Workstation

https://communities.vmware.com/t5/VMware-Workstation-Pro/Active-Directory-corruption-after-deleting-...

Reply
0 Kudos
wila
Immortal
Immortal

Hi,

Same suggestion... test your disk and test your RAM.
Not saying it is impossible that a bug has slipped in that area, but it is highly unlikely and instead my bet is on faulty hardware.

--
Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos
DanielLBenway
Contributor
Contributor

I find it highly unlikely that we're looking at a hardware problem. I'm running Windows 10, VMWare 16, email, video conferencing, IM, Word, Excel, Visio, web browsers, and many others with no problems whatsoever on the physical host, its applications, or the virtual machines *except* for when deleting a snapshot. If there were disk or memory issues, I'd see blue screens and application crashes all over the place on the physical host and the virtual machines. Add to this logic the fact that others have posted about this exact problem, and I conclude we're looking at a bug, a configuration flaw, or a usage problem.

Reply
0 Kudos
wila
Immortal
Immortal

Hi,

Yes, someone else has pointed out a similar problem, 9 months ago.. a single post, and I'm sorry but that's not what you'd see with those problems. Your problems do match exactly with hardware issues, more likely RAM than disk.
It is up to you if you believe that assessment or not, I'm bowing out of this discussion.

FWIW, here's a post for hints on how-to test that. I prefer MemTest86+ myself, but you can do it in Windows too.
https://www.howtogeek.com/260813/how-to-test-your-computers-ram-for-problems/

edit: I stand corrected, one other post, also months ago.

--
Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos
DanielLBenway
Contributor
Contributor

Wil,

I really appreciate your suggestions!

I moved the VM over to a completely different hardware platform, and it booted up just fine. Then I deleted the old snapshot, and the problem occurred.

As such I'm concluding this is not a hardware problem.

If you have other ideas, I'm happy to try them.

Thank you!

🙂

Reply
0 Kudos
wila
Immortal
Immortal

Hi,

Sorry, but when you say "then I deleted the old snapshot", was that snapshot made on the old hardware?
Because if so then it still does not exclude that the other hardware has problems. A snapshot made on a host with RAM problems is not a reliable snapshot.

I agree with you that if you create a snapshot on the new hardware _and_ delete the snapshot there and still see the issue that the suspect moves away from hardware on to vmware.
In that scenario I would be interested to see logs and preferably with the debugging tuned up all the way.

Note that it would be pretty huge if the snapshot functionality is not reliable (for a lot of use cases) as it is one of the core functionalities used with virtualisation.

Btw, I might have missed it, but what exact version of VMware Workstation are you running, what is the host OS, what is the guest OS, what exactly are the symptoms (boot repair loop, or like the other post?)
--
Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos
DanielLBenway
Contributor
Contributor

Wil,

I'm going to run the memory tests now, in order to confirm and rule that out.

Thanks!

🙂

Reply
0 Kudos
DanielLBenway
Contributor
Contributor

Wil,

I've run chkDsk on the host, and the VM, no errors found.

I've run the MDSched memory check twice on the host, no errors found.

I'm currently using VMWare Workstation 16.1. The problems seem to have started with the last few versions of 15? All of the machines are cloned off of a MS Windows 2019 base disk VM.

So far the snapshot deletions have rendered 3 DCs unbootable, 1 MS non-Domain proxy server, and have prevented 2 Domain MS CAs' ADCS DBs from starting (so it messes pretty hard with AD's NTDS Jet database, CAs' Jet databases, and the MS Proxy server).

Which logs should I focus on?

I'm also hoping to open an official case with VMWare, and just upload the most problematic DC to them.

Thank you!

🙂

Reply
0 Kudos
wila
Immortal
Immortal

Hi,

Good news for your hardware I suppose and less good news for isolating the problem.

FWIW, I'm a very heavy user of snapshots and all my customers of my product Vimalin depend on that as well.

I cannot reproduce your issue so far, but for reproducing details are key.
OK, so you've seen this with Workstation 16.1 and Windows Server 2019 and the problem is that you loose the ability to boot and breaks the MS Jet databases. Those are all very serious issues if it can be reproduced.

Please lets eliminate the old host completely in this picture and see if you can reproduce this on the new host without having to use old snapshots from the previous host.

Change the debug info from "Default" to "Full". (see screenshot)

wila_0-1607908769903.png

 

and see if you can reproduce the problem.

Supply as much detail as possible as to what you think is needed for anyone else to follow your steps and reproduce the issue.

Once reproduced upload the vmware.log files here after you should down the VM (zip them all up before attaching as the new forum limits uploads to 1 file per post (sigh) )

Re. opening a support ticket. You can only do that with VMware if you have a support contract with them. I don't have one.

But there is a reasonable chance that a VMware developer is reading along.. and if so then I expect them to be interested.

--
Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos
DanielLBenway
Contributor
Contributor

I'm currently working this case (20180242312) with VMWare. Thanks so much for the help so far! 🙂

 

Reply
0 Kudos
wila
Immortal
Immortal

You're welcome.

Please keep us in the loop if they find something!

--
Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos