VMware

This Question is Answered

1 "correct" answer available (10 pts)
1 2 Previous Next 25 Replies Last post: Aug 6, 2009 8:09 AM by ODOCChuck   Go to original post
Click to view mike.laspina's profile Virtuoso 2,270 posts since
May 26, 2006
I can see why we are not on the same page. If VMware would provide a clear description of the functional parts of LGTOsync we would not have these issues or this discussion.

Here is how I understand the functional side of this driver.

Since ESX can not determine what the VM is doing with disk writes and memory cache at the time a snapshot is requested it cannot correctly freeze the I/O state of the vmdk for that snapshot to provide integrity of any cached SCSI operations.

VMware needed to montior this SCSI disk write and memory cache activity and feed that info back to the ESX physical hardware and complete those disk I/O operations in order to provide the integrity requirements of backup functions. They provided this capability by developing the LGTOsync driver with Legato. So when I think of the LGTOsync driver I am not looking at the VM flushing it's writes, It's the underlying host that needs to do this function. If you disable the driver you lose this capability all together. So unless you can tell me that you know for certain that the VMWare SCSI drivers support forced unit access down to the host OS then it is better to run the driver.

So I consider the following when I build VM's for AD or any DB's

http://support.microsoft.com/kb/888794

http://communities.vmware.com/servlet/JiveServlet/downloadImage/1953/lgtosync.PNG

Click to view Justin King's profile Enthusiast 85 posts since
Oct 26, 2006

Huh, I think we're saying the same thing but comming to different conclusions :)

The meat of what I'm saying is that AD is simply stored in a jet database. Just like any other modern database it keeps log files of data to be commited and thus this level of data security is already partially accounted for. A partially commited DB change would still have a log present even in a circular setup and thus I consider the issue minor. Perhaps I'm unique in this area, but I've had a number of problems occur when a snapshot occurs on a DC and it has the sync driver installed. Observational data implies the entire VM is paused while the host commits vmdisk writes, simply uninstall the driver and do a snap and look at the difference in time. At least on any of my esx hosts it's easily visible, and the entire host is essentially "paused" for a few seconds.

Only VMware docs I can find that reference FUA is this one: http://www.vmware.com/files/pdf/usenix07.pdf

Still need to read through it though.

Click to view mike.laspina's profile Virtuoso 2,270 posts since
May 26, 2006

Yes. Now we are on the same page. This paper identifies the issue of not knowing which writes need to be commitied at the Hypervisor layer. Now the part we need to know is does the VMware SCSI driver work with the Hypervisor to write FUA flagged requests immediately or not. Once that is known the we can determine if we can trust the underlying host OS with this task or does this only occur with the LGTOsync driver. And I completely agree that todays DB's will deal with these issues much better than 1st and 2nd gen ones did and will recover to a usable state. They do still get in trouble when under extreem stress as the requests start saturating cache and can bottle neck with a transaction log and data items in the write-back cache at the failure point.

Thats a good paper, thanks for sharing it.


Click to view FunkyD's profile Enthusiast 113 posts since
Sep 1, 2006

It seems to me that the golden rule with any server that hosts a database is to backup the database first either locally or using your file backup solution and then snapshot the server. To recover, restore the snapshot and then resotre the databases following whichever method suits. You cannot reliably snapshot a server with a database and simply restore it and expect it to work - it might work but it's better not to risk it.

On that basis my strategy is to do a daily snapshot of file servers and SQL (they all have a local backup of the database) and sftp them to the remote site. For database servers (AD, Exchange) I do a monthly snapshot, sftp to the remote site and to recover use the last nights backup to disc files for the databases.

I am hoping to improve things by upgrading Veritas 9 to version 11 so I can use the VCB module.

Click to view mike.laspina's profile Virtuoso 2,270 posts since
May 26, 2006
Yes it is, I completely concur and that is what I suggested on the first post I made on this thread.
Click to view vmcms's profile Enthusiast 26 posts since
Aug 23, 2007

Justin,

It looks like you've got a process for recovering from a corrupted DC such that AD corruption seems a non-event should it transpire.

Does this take into account DNS and AD cleanup? My experience is that though these objects are supposed to clean themeselves up, they do not always. And occasionally there is a DC setting or two in Microsoft Exchange 00/03/07 that gets stuck on an old DC that has to be manually tracked down and corrected.

With these things explicitly considered, how significant an event do you feel AD corruption and recovery to be?

VMCMS

Click to view Tounet's profile Lurker 2 posts since
Mar 7, 2008

Alternative method could be using the pre-freeze script of vcb (C:\windows\pre-freeze-script.bat) and post-thaw script (C:\windows\post-thaw-script.bat) inside the virtual machine.

You could easily stopping the services database, or run a locally backup of the database if you can't stop the services for few seconds / minutes.

For AD Controller, you could use these script for run systemstate backup locally with ntbackup. Don't forget that systemstate is the only supported and recommended microsoft solution for backup AD Controller.


Click to view Jabadakkas's profile Novice 5 posts since
Jul 15, 2008
A few months ago I had a chat with a VMware pre-sales consultant. He told me that the upcoming ESX 3.5 Update 2 which is due Q3 2008 will include Volume Shadow Copy Service (VSS) support. This will hopefully eliminate any problems ESX admins are experiencing in regards to snapshots on AD vm's. I googled for VSS support and found that the Virtual Server 2.0 betas already include VSS support. Ofcourse I realize that Virtual Server is a hosted virtualization product and accesses storage in a different way than ESX does, but still you could give it a try. :8}

Are there any forum readers that tested VSS on Virtual Server? Were you able to snapshot an AD vm without the corruption problems on ESX that alastairc identified at the start of this forum topic?

Click to view piyush1414's profile Lurker 1 posts since
Apr 3, 2009

Try Edb Repair tool to repair corrupted Active Directory.

Download the free Trial from here http://www.edbrepair.org/edb-active-directory-recovery.php

Click to view Josh26's profile Hot Shot 208 posts since
Mar 15, 2009

alastairc wrote:Hi Mike,

. Effective immediately I'm removing the sync driver from all our VMs which host databases of any type.

Could this be described as a bug, or is it just bad practice to install the filesystem sync driver in VM's which host databases?


I see this advise thrown around a lot, with no regard for the consequences. Basically what the sync driver does is to freeze IO, in a "best effort" at getting an application consistent snapshot. Most databases, eg Exchange and Oracle have major issues with this write lag.

The "just disable the driver" approach can be just as dangerous. It means nothing is remotely consistent in the snapshot. Try restoring one of those snapshots, and doing a chkdsk. You have about a 50% chance of something requiring a repair.

The bigger warning is, don't try restoring a snapshot of yoru domain controller. It's not pretty.

Click to view ODOCChuck's profile Lurker 3 posts since
Mar 3, 2008

Is this still an issue, have updates to the driver or an Update release resolved the issue with the synchDriver, or is it "works as designed" and dont expect a fix? I am not so concerned about a corrupted AD, but it does cause issues when the DB is corrupted. There are more reasons to do a snap shot than restoring a full VM.


VMware Developer

SDKs, APIs, Videos, Learn and much more in the Developer community.

Learn More

Developer Sample Code

Increase your developer productivity with VMware API sample code.

Learn More

VMworld Sessions & Labs

Online access to the latest VMworld Sessions & Labs and online services.

Learn more

Purchase PSO Credits Online

Purchase credits to redeem training and consulting services online.

Buy Now

Community Hardware Software

View reported configurations or report your own.

Learn More

VMware vSphere

Come witness the next giant leap in virtualization.

Register Today

Communities