We did a dc promo on our first vm DC here. It did not function well for the 3 weeks it was a dc. The DC would stop accepting any (domain admin) account logon requests at the console every few days. It had to be re-booted to get it working again. To not risk AD corruption we have dc promo it out of the domain. Time sync on the dc looked good every time I checked it.
There are several re-commendations in the link below. But some steps I am unsure if I should implement or not. Because of the context in which I read them. I was not at the vm world presentation.
http://download3.vmware.com/vmworld/2006/tac9710.pdf
Environment
ESX 3.0.1 32039
Windows 2003 enterprise SP1 vm 2 vcpu
Single vmdk file for OS, logs, sysvol etc....
Some steps I took to try and resolve the issue:
Changed disk to independent/persistent mode
Disabled the sync driver. Even though we were not using vcb to back it up.
http://kb.vmware.com/KanisaPlatform/Publishing/947/5962168_f.SAL_Public.html
Neither change had a positive effect.
Thanks for your thoughts....
Some events that we noticed when the dc started having problems. May or may not be related:
Event Type: Information
Event Source: LGTO_Sync
Event Category: None
Event ID: 1
Date: 2/16/2007
Time: 5:16:08 PM
User: N/A
Computer: CEINVDC03
Description:
The description for Event ID ( 1 ) in Source ( LGTO_Sync ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: , Flush Completed.
Data:
Event Type: Warning
Event Source: W32Time
Event Category: None
Event ID: 22
Date: 2/16/2007
Time: 5:16:13 PM
User: N/A
Computer: CEINVDC03
Description:
The time provider NtpServer encountered an error while digitally signing the NTP response for peer 172.30.107.129:123. NtpServer cannot provide secure (signed) time to the client and will ignore the request. The error was: A device attached to the system is not functioning. (0x8007001F)
Event Type: Error
Event Source: KDC
Event Category: None
Event ID: 7
Date: 2/16/2007
Time: 5:16:14 PM
User: N/A
Computer: CEINVDC03
Description:
The Security Account Manager failed a KDC request in an unexpected way. The error is in the data field. The account name was CEINVBH01B$@CORP.CENT.ORG and lookup type 0x20.
Event Source: NTDS Replication
Event Category: Replication
Event ID: 1084
Internal event: Active Directory could not update the following object with changes received from the following source domain controller. This is because an error occurred during the application of the changes to Active Directory on the domain controlle
Additional Data
Error value: 1127 While accessing the hard disk, a disk operation failed even after retries
Event Source: NTDS KCC
Event Category: Knowledge Consistency Checker
Event ID: 1435
The Knowledge Consistency Checker (KCC) encountered an unexpected error while performing an Active Directory operation.
Operation type:
KccModifyEntry
Object distinguished name:
CN=NTDS Site Settings,CN=DEN,CN=Sites,CN=Configuration,DC=cent,DC=org
The operation will be retried at the next KCC interval.
Additional Data
Error value:
7 00000070: SysErr: DSID-02020E6E, problem 28 (No space left on device), data -510
Internal ID: f08051c
Ive implimented many DC's as VM's, and generally follow similar lines..
Before we do first DC as a VM we always..
Check current error Logs, then extensive DNS testing of the infrastructure first (inc SRV Records) run dcdiag and netdiag to see if all is ok..
Use RepAdmin to test status of Directory Service..
There are more, but we want to check, DNS, replication Topology, Replication connectivity etc..
There is no magic with virtualization, just about every issue that is present in the physical world will be present in the virtual world..
Cant really say why your implimentation had issues, but ive certainly never expierience those types of problems..
What was the current state of your AD before introducing a new DC..? What else was running on your ESX Server, where had you placed the DC in relation to other VM's.. How is DNS setup in your environment..
I think the last two events are intersting:
"1127 While accessing the hard disk, a disk operation failed even after retries"
"7 00000070: SysErr: DSID-02020E6E, problem 28 (No space left on device[/b]), data -510
Internal ID: f08051c"
Do you have enough space on your disk? How is the disk behaving beside AD things? Do you see any other events related to disk?
The legato sync driver is the part of the VMware Tools that will quiesce the file system. I would highly suggest NOT allowing it to quiesce or remove that driver. It does not play well with AD or any other database currently.
We did check the health, logs, DNS, replication etc... on all sites in our parent and child domain. Before adding our first dc on a vm. We had several other vm's running on this esx server. I am not sure how those would have affected this dc.
After seeing the LGTO event we disabled it. But the dc still experienced the same problem(s). So removing it did not help.
Our disk was 20 gig. Which was more than enough. AD was only about 2 gig. We had about 11 gig free on the disk.
Sounds like I should do these for sure:
Seperate vmdk's for sysvol, logs and OS
Independent persistent disks
What other steps if any from the presentaion above has anyone implemented?
clock sync reg change on pdc emulator?
create a port group just for this dc?
modify weight/priority of dns srv records?
Thanks
Time may be in sync, but you are generating time sync errors which may force the DC to think it is out of sync.
Hello there,
We have several (okay 5 DC's) running under ESX 3.x without any errors (touching plastic ). We configure all our DC's with 3 vDisks (C: 8GB, 😧 4GB and E: 4GB) but don't bother with making them independent as we would not use snapshots on a DC. I guess we could ensure no snapshot was every run by doing this step but have not done so at the moment.
The process we follow on these VMs is as follows:
1) Ensure ESX server is getting time from an external NTP source
2) Disable Windows Time Service from within the VM
3) Enable VMware Tools to sync time from the host
This seems to work.
If the VM is a PDC then we do the following:
1) Ensure ESX server is getting time from an external NTP source
2) Modify time service so that the PDC believes it is the root time source, i.e. no sync
3) Enable VMware Tools to sync time from the host
In regards to the following:
making a port group[/i] | No, the DCs run on the same port group as our other VMs.
Modifying weight/priority on DNS records[/i] | No, this only adjusts whether more clients connect to your vDC or pDCs, in our case we don't care. On our main network we are personally planning on continuing to run the FSMO roles on physical rather than virtual. Why you ask - no real reason, just would like to keep a couple of physical DCs around.
Trust this makes sense and helps you out. Let us know.
Kind regards,
Glen
We are seeing the same "No disk space left" messages on virtual 2003 DCs as well. The disks are not out of space, I'm wondering if this has something to do with FC I/O? Anyone have any ideas as to the cause of this? Thanks!