citybird
Contributor
Contributor

Creating a snapshot with quiesce guest succeedes, but triggers vss-error in eventvwr on server 2008 r2 DC

Hi

I'm trying to backup my virtual DC, running Server 2008 R2. The Computer has all windows updates and Vmware Tools installed. When creating a snapshot (quiesce guest file system is active, but snapshop the VM's memory is not), the creation of the snapshot succeeds, but i get an error and a warning in the event viewer of the VM. The errors are triggered during the snapshot processing.

The error and warning are followed by a series of informational ESENT-Events, freezing all other Shadow copy instances. So the lsass AD is the only one raising an error.

-


Error 489, Source ESENT:

lsass (480) An attempt to open the file "c:\Windows\NTDS\ntds.dit" for read only access failed with system error 32 (0x00000020): "The process cannot access the file because it is being used by another process. ". The open file operation will fail with error -1032 (0xfffffbf8).

-


Warning 8229, Source VSS:

A VSS writer has rejected an event with error 0x800423f4, The writer experienced a non-transient error. If the backup process is retried,

the error is likely to reoccur.

. Changes that the writer made to the writer components while handling the event will not be available to the requester. Check the event log for related events from the application hosting the VSS writer.

Operation:

PostSnapshot Event

Context:

Execution Context: Writer

Writer Class Id: {b2014c9e-8711-4c5c-a5a9-3cf384484757}

Writer Name: NTDS

Writer Instance ID: {8231a194-b132-41b1-97e9-7c3f8333780d}

Command Line: C:\Windows\system32\lsass.exe

Process ID: 480

-


Because of that, I'm afraid that the backup of my AD might be inconsistent.

Any ideas on how to resolve this?

Thanks in advanced!

38 Replies
Paul_Brannon
Contributor
Contributor

There is a ghost "VMware Virtual disk SCSI Disk Device". You can uninstall it, but it'll reappear during the VSS Volume Snapshot. One "ghost" for each virtual drive. You can watch these disk devices appear in the device manager during the snapshot if you are showing hidden and non_present devices.

Thanks MKguy for the troubleshooting help but mismatched uuid's isn't my problem. During the initial ParseUUIDs, the disk uuid's from host are matching the uuid's in my .vmdk files. It's only the second set of ParseUUIDs that that cannot be matched to VMware disks. I suspect those are the uuid's of the shadow drives that appear momentarily in the device manager. This nonmatching seems to be normal as the VMs that can quiesce successfully during a snapshot also show this behavior.

My VSS failure on the vCenter server is logged in DebugView as:

SnapRequestor: writer ADAM (VMwareVCMSDS) Writer in failed state: failure = 800423f4

Looks like it relates to the Application event id 482 from ESENT, "An attempt to write to the file "
?\Volume{...}\Program Files\VMware\Infrastructure\Virtual Center Server\VMwareVCMSDS\edb.log"....failed: "The media is write protected".

My VSS failure on a domain controller is logged in DebugVew as:

SnapRequestor: writer NTDS in failed state: failure = 800423f4

This also relates to the Application event id 482 from ESENT, "An attempt to write to the file "
?\Volume{...}\Windows\NTDS\edb.log"....failed: "The media is write protected".

Anyone else seeing this? ta

0 Kudos
HendersonD
Hot Shot
Hot Shot

The ghost disk and ghost volumes is exactly what I am seeing when I view hidden devices so your explanation is a good one. When I look at Disk Management in the mmc console it shows one disk. After I snapshot the VM it shows two disks with one not connected. This happens for three of my VMs, all three of them are Server 2008 R2 and the kicker is all three were brought up using a template under ESXi 4.1. I have many other Server 2008 R2 VMs that were brought up under ESX 4.0 before I upgraded to ESXi 4.1. These VMs snapshot fine and do not leave behind the extra disk in disk management.

The extra not-connected disk shown in disk management makes it so the next snapshot does not work at all. If I delete this disk in disk management, restart the server, the next snapshot will work fine but any one after that bombs. I looked at the VMWare Tools log file for two VMs, one with this problem and one without. In both cases, the disk UUIDs do not match but somehow this does not affect certain VMs and does affect three of them (the only ones brought up under ESXi 4.1)!. In both cases I am still generating a VSS error in the logs, something about a generic_floppy_drive. None of my VMs have a floppy drive attached.

I have opened a case with VMWare. I realized a few days ago that my VC database is being hosted under WinServer 2008 R2 which is not a supported database. I am hoping this is not the root cause of my problem.

0 Kudos
MaxHeadRoom1
Contributor
Contributor

Hi Folks,

My environment is ESX 4.1, using NetBackup 7.0 to try and backup the vcenter server itself. Vcenter server is Windows 2008 R2 Standard (64-bit). Virtual Machine hardware version is 4 (I know haven't upped it to 7 yet).

I'm having exactly the same issue with ADAM VSS bombing out when trying to quiesce and take a snapshot using NetBackup. VMWare snapshot is fine. Setting the disk.EnableUUID attribute to "false" allows NetBackup to complete without error, although naturally the backup won't be consistent. So it looks like I have the UUID issue. My problem is in trying to setup VM Tools debugging so I can capture the two UUID's.

According to KB Article 1007873, for Windows Server 2008 the tools.conf file is supposed to be located in C:\Users\All Users\VMware\VMware Tools\tools.conf. On my Vcenter server this file does not exist. The only things in that folder are the subfolder "Unity Filters", as well as two files - manifest.txt and vss_manifests.zip. I have uninstalled and reinstalled VM Tools, but still no tools.conf.

I think there is something different in how debug is setup between various versions of VM Tools. The problem server has version 8.3.2, build 257589 with no tools.conf. On another Windows 2008 server with VM tools version 3.5.0, build-110268 tools.conf does exist. Strange?

I did notice that under the Advanced setting for the VM there is an option to enable logging and record debugging info. I've enabled this, but don't know where to find the log file or even if its generating one - also don't know if these setting still require SysInternals debugger - the help is fairly useless and I've only been able to find KB Article 1007873 regarding setting up debugging.

At the moment I'm looking at trying to get VM tools version 3.5.0, build-110268 on this machine so I can at least try and debug it that way. Other than that my only choice is to go with the disk.EnableUUID attribute set to "false" to allow backups to work.

Anyone have any comments on this, or any further info on how to setup debugging on the newer version of VM Tools?

Many Thanks,

Steve

0 Kudos
HendersonD
Hot Shot
Hot Shot

Steve,

I am not in front of my server right now (doing this from home) but if you show hidden file you should see a folder at the root of the C: drive called Program Data. Dig in there and you will find tools.conf

Dave

0 Kudos
MaxHeadRoom1
Contributor
Contributor

Hi Dave,

Thanks for the update. I've had a look, but unfortunately "C:\ProgramData\VMware\VMware Tools" has not tools.conf - has exactly the same structure as "C:\Users\All Users\VMware\VMware Tools" (Only subfolder "Unity Filters", as well as two files - manifest.txt and vss_manifests.zip)

I've also searched the entire machine - no tools.conf. Looks like something with the version of VM Tools.

Thx, Steve

0 Kudos
HendersonD
Hot Shot
Hot Shot

A VMWare engineer did a webex session with me today and his solution was to install VMWare Tools without the VSS driver. I questioned whether I would get Operating Sytem queiscing without the VMWare Tools VSS driver and he said I would. He told me that the VSS driver built into Windows Server 2008 R2 would do the quescing and I should be fine.

Is this really the case?

MKguy
Virtuoso
Virtuoso

Hey HendersonD, today I noticed the same issue you were describing with the ghost-disk being left behind in disk management on 2008 R2.

I opened a case with the following info:

On multiple of our Windows 2008 R2 VMs, we’re having issues with the creation of quiesced snapshots:

“Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine.”

We were able to track the issue down to the following cause:

When creating a quiesced Snapshot on 2008 R2, the disks of the VM briefly appear as duplicated and mounted concurrently. This can be observed in the Windows disk management MMC. Under normal circumstances, the duplicate disk should entirely disappear after the snapshot is done. However, on our VMs where the issue occurs, the disks continue to be registered. Now, when another quiesced snapshot is created and this “ghost-disk” in disk management is still present, the snapshot fails with the above error. The windows eventlog of the VM also logs errors like:

"The device, \Device\Harddisk1\DR2, is not ready for access yet."

A manual disk-rescan in Windows is required to “cleanup” the ghost-disk of the snapshot. After a manual rescan, creating a new quiesced snapshot succeeds, but leaves the same ghost-disk behind yet again, causing the next quiesced snapshot to fail.

Rebooting and powering-off and back-on the VM did not help either. The first quiesced snapshot succeeded (because the reboot cleaned the disks up), but the following quiesced snapshots all failed.

Creating a non-quiesced snapshot always works.

Like your problem, this happened on all VMs that we deployed from a 2008R2 template since the 4.1 upgrade.

I was able to get it to work permanently on all all VMs I tried to far by simply doing a repair installation of the VMware tools, including the VSS option.

-- http://alpacapowered.wordpress.com
0 Kudos
HendersonD
Hot Shot
Hot Shot

MKGuy,

So the fix is just a repair of the VMWare tools? Did you uninstall/reinstall or just run a manual install and choose the Repair option?

Did VMWare come up with this fix once you opened the case?

Dave

0 Kudos
MKguy
Virtuoso
Virtuoso

I just used the plain, default repair-option of the VMware Tools setup and rebooted the system. No manual uninstallations or anything.

I came up with it on my own after opening the case, haven't got a response yet.

-- http://alpacapowered.wordpress.com
0 Kudos
HendersonD
Hot Shot
Hot Shot

I will give it a go on one of my problem VMs and let you know how I make out.

0 Kudos
MKguy
Virtuoso
Virtuoso

Hm, it first looked like the template was messed up or something, because after converting it to a VM and trying to quisce-snapshot it twice, it exhibited the same behaviour we encountered from it's "childs". The repair installation also worked for the template itself, but we have the issue with every VM deployed from this template, even now when the original source is actually fixed.

For now, our only workarounds seem to be either performing a manual rescan (no reboot required) each time after a snapshot has been created or performing a repair installation of the tools.

-- http://alpacapowered.wordpress.com
0 Kudos
MKguy
Virtuoso
Virtuoso

HendersonD, did you give it a try in the meantime yet?

After performing some tests, today the issue it appears to be related to the sysprep of the guest customization process:

Deploying a VM from the Template without customization or cloning the template/a repair-installed VM went fine without error every time.

But whenever a VM was sysprep'd, be it by the guest-customization during template deployment, the vCenter Converter "Reconfigure" Option with the "Customize Guest" checkbox, or even a manual run of "sysprep /generalize" in the guest, it would leave behind the duplicate disk and cause the next snapshot to fail.

Repair-installing the VMware Tools always solved it. Waiting for a response from support again.

-- http://alpacapowered.wordpress.com
0 Kudos
HendersonD
Hot Shot
Hot Shot

I have three VMs that I have made from a template since upgrading to ESXi 4.1. For all three I used guest customization. I did the tools repair on one of them yesterday and included it in my backup routine last night. It worked fine, doing the VMWare snapshot and not leaving behind any phantom disks. I am going to do the other two today and let you know the results.

What if I:

1. Change my template into a VM

2. Do a tools repair on this VM

3. Change it back into a template

4. Bring up a new VM off this template, using guest customization

The question becomes will the tools repair on the template avoid the new VM from having the same problem or is guest customization the culprit and I will need to do another Tools repair on the newly created VM?

0 Kudos
MKguy
Virtuoso
Virtuoso

What if I:

1. Change my template into a VM

2. Do a tools repair on this VM

3. Change it back into a template

4. Bring up a new VM off this template, using guest customization

As I wrote before, I already did that and fixed my template. But even when my template is fixed now, it doesn't matter. Any sysprep, be it manual or through the guest customization wizard, causes the issue again.

-- http://alpacapowered.wordpress.com
0 Kudos
HendersonD
Hot Shot
Hot Shot

No sense fixing my template then. When I deploy via my template I will have to remember to do the Tool repair on the newly created VM. Is VMWare aware of this bug? Do they have a time frame when it will be fixed?

Dave

0 Kudos
HendersonD
Hot Shot
Hot Shot

The other two VMs I was having trouble with are now working fine. The VMWare Tools repair did the trick!

0 Kudos
MKguy
Virtuoso
Virtuoso

That's good to hear. Just a question, but were those 3 VMs also DC's or do they run any other applications that would require really VSS writers (SQL, ADAM etc.)?

Support told me they saw the issue with the remaining ghost-disk in VMs with dynamic disks before, but not with basic disks like in our case. I supplied logs today and it seems like it will be escalated to engineering.

-- http://alpacapowered.wordpress.com
0 Kudos
HendersonD
Hot Shot
Hot Shot

My centralized storage is Netapp and we use a piece of their software called Snapmanager for Virtual Infrastructure. I have two backup jobs created in this product:

1. A VMWare snapshot is taken of each VM followed by a volume snapshot of the entire datastore. Each VMWare snapshot is then removed

2. A VMWare snapshot is NOT taken first, only a volume snapshot of the datastore

Most of my VMs are in Job 1 with a few exceptions:

1. Domain Controllers - I have been told many times to never restore a DC from a snapshot. My domain controllers are not contained in either of the jobs described above. I have a backup agent (we happend to use Commvault backup software) installed on each of my DCs and use that for backup

2. Exchange - the C: drive for my Exchange 2007 servers are stored under VMWare. I present to my mailbox server two luns, one to store the Exchange database and one to store logs. VMWare does not support taking snapshots of VMs that have luns attached via Microsoft's iSCSI initiator. For this reason my Exchange mailbox server is in Job 2. My Exchange client access server is in Job 1. I do backup of my Exchange database and logs using Netapp's Snapmanager for Exchange product.

3. SQL - My SQL 2008 R2 server has its C: drive stored on this VMWare datastore. I have 4 other disks on this VM to store the SQL databases, logs, etc. These 4 disks come off of 4 separate volumes, not my VMWare datastore. I use Netapp's Snapmanager for SQL to backup my SQL databases and log files. I could never get the SQL VM to snapshot properly so it is in Job 2.

4. VirtualCenter - I could never get my VC server to snapshot properly so it is in Job 2

Have you had any luck with doing VMWare snapshots on either SQL or your VirtualCenter server?

0 Kudos
mdgunther
Contributor
Contributor

FYI, the release of 4.1U1 contains an update to VMware-tools that fixes this problem for me.

0 Kudos