VMware Cloud Community
gdesmo
Enthusiast
Enthusiast

W2K3 Domain controller on VM

We did a dc promo on our first vm DC here. It did not function well for the 3 weeks it was a dc. The DC would stop accepting any (domain admin) account logon requests at the console every few days. It had to be re-booted to get it working again. To not risk AD corruption we have dc promo it out of the domain. Time sync on the dc looked good every time I checked it.

There are several re-commendations in the link below. But some steps I am unsure if I should implement or not. Because of the context in which I read them. I was not at the vm world presentation.

http://download3.vmware.com/vmworld/2006/tac9710.pdf

Environment

ESX 3.0.1 32039

Windows 2003 enterprise SP1 vm 2 vcpu

Single vmdk file for OS, logs, sysvol etc....

Some steps I took to try and resolve the issue:

Changed disk to independent/persistent mode

Disabled the sync driver. Even though we were not using vcb to back it up.

http://kb.vmware.com/KanisaPlatform/Publishing/947/5962168_f.SAL_Public.html

Neither change had a positive effect.

Thanks for your thoughts....

Some events that we noticed when the dc started having problems. May or may not be related:

Event Type: Information

Event Source: LGTO_Sync

Event Category: None

Event ID: 1

Date: 2/16/2007

Time: 5:16:08 PM

User: N/A

Computer: CEINVDC03

Description:

The description for Event ID ( 1 ) in Source ( LGTO_Sync ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: , Flush Completed.

Data:

Event Type: Warning

Event Source: W32Time

Event Category: None

Event ID: 22

Date: 2/16/2007

Time: 5:16:13 PM

User: N/A

Computer: CEINVDC03

Description:

The time provider NtpServer encountered an error while digitally signing the NTP response for peer 172.30.107.129:123. NtpServer cannot provide secure (signed) time to the client and will ignore the request. The error was: A device attached to the system is not functioning. (0x8007001F)

Event Type: Error

Event Source: KDC

Event Category: None

Event ID: 7

Date: 2/16/2007

Time: 5:16:14 PM

User: N/A

Computer: CEINVDC03

Description:

The Security Account Manager failed a KDC request in an unexpected way. The error is in the data field. The account name was CEINVBH01B$@CORP.CENT.ORG and lookup type 0x20.

Event Source: NTDS Replication

Event Category: Replication

Event ID: 1084

Internal event: Active Directory could not update the following object with changes received from the following source domain controller. This is because an error occurred during the application of the changes to Active Directory on the domain controlle

Additional Data

Error value: 1127 While accessing the hard disk, a disk operation failed even after retries

Event Source: NTDS KCC

Event Category: Knowledge Consistency Checker

Event ID: 1435

The Knowledge Consistency Checker (KCC) encountered an unexpected error while performing an Active Directory operation.

Operation type:

KccModifyEntry

Object distinguished name:

CN=NTDS Site Settings,CN=DEN,CN=Sites,CN=Configuration,DC=cent,DC=org

The operation will be retried at the next KCC interval.

Additional Data

Error value:

7 00000070: SysErr: DSID-02020E6E, problem 28 (No space left on device), data -510

Internal ID: f08051c

Reply
0 Kudos
7 Replies
acr
Champion
Champion

Ive implimented many DC's as VM's, and generally follow similar lines..

Before we do first DC as a VM we always..

Check current error Logs, then extensive DNS testing of the infrastructure first (inc SRV Records) run dcdiag and netdiag to see if all is ok..

Use RepAdmin to test status of Directory Service..

There are more, but we want to check, DNS, replication Topology, Replication connectivity etc..

There is no magic with virtualization, just about every issue that is present in the physical world will be present in the virtual world..

Cant really say why your implimentation had issues, but ive certainly never expierience those types of problems..

What was the current state of your AD before introducing a new DC..? What else was running on your ESX Server, where had you placed the DC in relation to other VM's.. How is DNS setup in your environment..

Reply
0 Kudos
cheeko
Expert
Expert

I think the last two events are intersting:

"1127 While accessing the hard disk, a disk operation failed even after retries"

"7 00000070: SysErr: DSID-02020E6E, problem 28 (No space left on device[/b]), data -510

Internal ID: f08051c"

Do you have enough space on your disk? How is the disk behaving beside AD things? Do you see any other events related to disk?

Reply
0 Kudos
kix1979
Immortal
Immortal

The legato sync driver is the part of the VMware Tools that will quiesce the file system. I would highly suggest NOT allowing it to quiesce or remove that driver. It does not play well with AD or any other database currently.

Thomas H. Bryant III
gdesmo
Enthusiast
Enthusiast

We did check the health, logs, DNS, replication etc... on all sites in our parent and child domain. Before adding our first dc on a vm. We had several other vm's running on this esx server. I am not sure how those would have affected this dc.

After seeing the LGTO event we disabled it. But the dc still experienced the same problem(s). So removing it did not help.

Our disk was 20 gig. Which was more than enough. AD was only about 2 gig. We had about 11 gig free on the disk.

Sounds like I should do these for sure:

Seperate vmdk's for sysvol, logs and OS

Independent persistent disks

What other steps if any from the presentaion above has anyone implemented?

clock sync reg change on pdc emulator?

create a port group just for this dc?

modify weight/priority of dns srv records?

Thanks

Reply
0 Kudos
zaznet
Enthusiast
Enthusiast

Time may be in sync, but you are generating time sync errors which may force the DC to think it is out of sync.

Reply
0 Kudos
ThompsG
Virtuoso
Virtuoso

Hello there,

We have several (okay 5 DC's) running under ESX 3.x without any errors (touching plastic Smiley Wink ). We configure all our DC's with 3 vDisks (C: 8GB, 😧 4GB and E: 4GB) but don't bother with making them independent as we would not use snapshots on a DC. I guess we could ensure no snapshot was every run by doing this step but have not done so at the moment.

The process we follow on these VMs is as follows:

1) Ensure ESX server is getting time from an external NTP source

2) Disable Windows Time Service from within the VM

3) Enable VMware Tools to sync time from the host

This seems to work.

If the VM is a PDC then we do the following:

1) Ensure ESX server is getting time from an external NTP source

2) Modify time service so that the PDC believes it is the root time source, i.e. no sync

3) Enable VMware Tools to sync time from the host

In regards to the following:

making a port group[/i] | No, the DCs run on the same port group as our other VMs.

Modifying weight/priority on DNS records[/i] | No, this only adjusts whether more clients connect to your vDC or pDCs, in our case we don't care. On our main network we are personally planning on continuing to run the FSMO roles on physical rather than virtual. Why you ask - no real reason, just would like to keep a couple of physical DCs around.

Trust this makes sense and helps you out. Let us know.

Kind regards,

Glen

JaySMX
Hot Shot
Hot Shot

We are seeing the same "No disk space left" messages on virtual 2003 DCs as well. The disks are not out of space, I'm wondering if this has something to do with FC I/O? Anyone have any ideas as to the cause of this? Thanks!

-Justin
Reply
0 Kudos