VMware Cloud Community
JeffHarrison
Contributor
Contributor

NTP Time Sync Issue for DC when keeping NTP on a Router.

Hello,

I've read through all of the many Time Sync issue threads but can't seem to find anything that applies to my particular environment.

I'm looking to virtualize one of my domain controllers and I'm concerned about the Time Sync issue. Most of the other threads describe having the DC get it's time sync from an ESX Host. Right now my NTP server is not a DC but it is our core router.

My questions is: Do I need to configure the ESX Host to sync it's time with the NTP (cisco) core router, or do I configure the virtual machine itself to sync with the NTP router?

Any insight would be much appreciated. Thanks!

Reply
0 Kudos
23 Replies
mikepodoherty
Expert
Expert

We do both - the ESX hosts are configured to use timeservers but the Windows DCs are also configured to use timeservers. The member servers get their time from the DCs.

AD is so sensitive to time issues that we don't take chances.

HTH

Mike

Reply
0 Kudos
JeffHarrison
Contributor
Contributor

Thanks for the reply.

So do you have the DCs getting their time through the ESX hosts using VMWare tools? Do you also have the Win32Time parameters configured to use a different NTP server, or this is also configured to get time from the ESX host?

Reply
0 Kudos
Dollar
Enthusiast
Enthusiast

As a contrast in methods, my VM domain controllers get their time from other domain controllers using Windows time (the default standard way). I DO NOT set VM Domain controllers to time synch with the Host server. Doing so requires you disable the W32Time service on the domain controller and this causes a backlash within Active Directory. As an example, domain controllers will not advertise they are a time service if you disable the W32Time service. I am aware of the "NoSynch" option but this also causes domain controllers to not advertise as a Time server. If you run DCDIAG on any DC on which you've done one of these two things (use the "NoSynch" option or disable W32Time Service) it will fail on the FSMO Role check. If you do one of these two things to a PDC, all DCs will fail the FSMO Role check.

Regardless, despite the documentation from VMWare on "Best Practices" on domain controller time management, I have production VMs DCs supporting thousands of users, even doing wireless authentication, and I am not synching their time with the host via VMTools. What I am doing is synching host time with the same source as the Forest Root DC (which is the default time server for a domain). I have not experienced any issues with the servers or time drift. One of these virtual DCs has been in production operation for 18 months, the others for about 6 months.

Whenever I build an ESX Host, I process the following commands on that host to ensure proper time synch (with 10.1.1.1 being the time source):

esxcfg-firewall --enableService ntpClient

esxcfg-firewall -o 123,udp,out,ntp

ntpdate 10.1.1.1

service mgmt-vmware restart

chkconfig ntpd on

hwclock --systohc

service ntpd start

What you cannot do (as of last check about 18 months ago) is use a Windows 2003 Server as your NTP Source for ESX Servers. For some stupid reason, Windows 2003 will only provide time to Domain Members. A Windows 2000 Box will provide time to an ESX Server.

Reply
0 Kudos
mikepodoherty
Expert
Expert

I guess I wasn't clear - The DCs are configured to use separate timesservers from the ESX hosts. None of the virtual servers use the host as a time source.

Reply
0 Kudos
nabsltd
Enthusiast
Enthusiast

Doing so requires you disable the W32Time service on the domain controller and this causes a backlash within Active Directory.

You do not need to disable the W32Time service to allow it to sync with the ESX host. You just set VMware tools to sync with the host and change the config of W32Time to "NoSync". You can do this via the registry, but an easier way is to install the Windows Time Agent.

Reply
0 Kudos
Mark_Brophy
Contributor
Contributor

By default, Domain Controllers do not use NTP for time configuration. They use AD replication. You have to configure one of your DC's to be the authoritive time server for which the rest of the DC's will get their time from in case of a discrepancy. You can set the time service on the DC to sync with any other NTP device, most common is the Atomic Clocks, but it can be anything else that you want it to get time synch from.

You do not have to have your DC's get time from the ESX host, in fact it is a discouraged practice to do so.

Reply
0 Kudos
Hairyman
Enthusiast
Enthusiast

My DCs sync with a master DC which then syncs with a 3rd party host that uses an atomic clock. our ESX 3.0.2 hosts also sync their times externally as well. don't ask me where but the Dell guys set it all up for us

Reply
0 Kudos
gorto
Enthusiast
Enthusiast

Please be aware that ALL guests force-sync their time with the host ESX under special circumstance (such as VMotion and DRS migration), irrespective of VMTools settings unless specifically disabled with:

tools.syncTime = "FALSE"

time.synchronize.continue = "FALSE"

time.synchronize.restore = "FALSE"

time.synchronize.resume.disk = "FALSE"

time.synchronize.shrink = "FALSE"

time.synchronize.tools.startup = "0"

or

time.synchronize.tools.startup = "FALSE"

Read the latest:

or http://communities.vmware.com/message/1069201

Reply
0 Kudos
wilson94t
Enthusiast
Enthusiast

Short answer: Go to the Cisco device.

A few more details....

There is no reason what so ever that you need to have your VM talk to your ESX host to get time. Infact, I would request that you do not do this.

Do not use VMware Tools for time sync. It's not very good at it. At best, your resultion will be forward stepping only, and a 60 seconds resolution. Turn off VMware tools time sync option and go with NTP. type in: net time /setsntp:ciscobox.yourcompany.tld,0x1 and restart windows time service.

You might consider syncing NTP from your cisco device and then syncing via NTP from your VM to the host, but why?. Your strattum has just dropped by doing this, and there is no reason for it, unless you belive the cisco device is unable to handle another NTP poll. Plus, why give ESX yet another thing to do?

Best best is to point your VM's to the source furthest point up the chain that is acceptable.

FWIW, Our design is to point all our VM's (and our ESX servers, physical servers, routers, appliances, workstations, AD servers, etc) to a common source, which we call "time.company.tld". This box syncs with internet time servers. Windows 2003 and beyond actually has a reasonable NTP implementation, so this works OK.

It's not the best idea to keep passing time around from box to box. The fewer number of devices you rely on for time, the better.it makes troubleshooting much easier.

Reply
0 Kudos
gorto
Enthusiast
Enthusiast

I agree to having a common authorative time-source, but having all your devices (including VM guests) point to it seems wasteful in resources.

Why not tier time delegation ? Have the ESX hosts get their time from the time authority, this case the CISCO router then allow the guest get their time from the ESX host - its uses less network packets and keeps guest time simple.

Especially knowing that on some events guest are forced time-sync with ESX hosts - unless specifically masked in the vmx file.

Reply
0 Kudos
wilson94t
Enthusiast
Enthusiast

Which resources are wasted in a common configuration? If the cisco device is up to the task, then it is a simple approach. One confiuration to maintain for a time source, one configuration to maintain for ESX hosts, one configuration to maintain for Windows servers (physical or VM).

If you use an ESX host, then which? VM's on ESX host A will use ESX Host A for NTP server? for what advantage? What about when ESX server B is installed? should VM's on ESX host B (or ESX Cluster B for that matter) then contact ESX host A? or should they contat another source? Where shoud the physical windows servers get time?

NTP query is a low impact communication. Having a single approach to time ensures consistancy and expectation across the enviornment.

If there is a problem using the CISCO router for the NTP, then use something else, but have everything go to one place. We're able to sync ~ 70,000 endpoints off a couple of older Linux powered HP DL360's... and the only reason we have 2 is for redundancy. One handles the load just fine. Really, it's not that much impact, and the single approach will save man hours, which, in my mind, are the most important resources.

As for network packets, if NTP traffic is really that much of a problem, it's time to get a CDMA Stratum1 reciever and avoid the network all together. It's hard for me to imagine that NTP is the thing that will damage your network infrastructure. Original poster: Is this a concern ?

Reply
0 Kudos
gorto
Enthusiast
Enthusiast

Well, network bandwidth for a start - having ALL network devices (physical and virtual) go to the same time source when a tiered delegation (to the virtual) would cut-down on that same traffic would be desirable. Tell me, when you do a DNS look-up do you go directly to the (internet) Root Name Servers? Of course not - same with network time - a tiered approach would be more bandwidth efficient, surely?

No-one is suggesting using the ESX as a NTP service. Each ESX hosts is an ntp client of the main network source and keeps time on its hosted guests using VMTools option. This is all done neatly and internally within the confines of the ESX host without the need of network bandwidth.

This approach is considered Best Practice, I believe.

Reply
0 Kudos
Erik_Zandboer
Expert
Expert

Hi,

I too am running the entire domain virtual, with a cisco router as ultimate time source. The way I have it configured:

1) The ESX hosts use the cisco router for their timesource. So all ESX hosts are kept in sync.

2) The domain controllers sync their time using VMware Tools with the hosts

3) All domain members do NOT sync their time using VMware Tools, since the DCs take care of that

4) All VMs which are not part of the domain, have timesync enabled in the VMware Tools.

Using this setup, all VMs are kept in sync all the time. Have had no issues with it whatsoever. It is clean and simple, and has (as far as I can tell) no issues or risks. The single point of failure being the cisco router, still poses no problem because the ESX hosts will keep their time for days correctly with ease. And therefore the VMs are kept in correct sync as well.

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
wilson94t
Enthusiast
Enthusiast

Frequency of DNS lookups is quite unrelated to NTP in terms of utilization. Don't be ridiculous.

NTP will decrease lookups as the local computation understands clock skew. Ultimately, once or twice a day of UDP will be sufficent.

The use of VMware tools to keep time, while recommended in VMware documenatation - is not ideal if you wish to keep accurate time. Why? VMware tools will correct for clock-too-slow only after the clock has met 60 seconds of delay and then abruptly jump forward 60 seconds. That's a full minute off! Perhaps that is not important to you, it is important to me.

Furthermore, VMware tools will do nothing for clock-too-fast. Once the clock gets ahead, it may not slow down. NTP, however, will slow a clock moving too fast, and speed a clock moving too slow. It will negotiate the means of clock over time and allow the system to work in a predictable and reliable fashion.

I recomend disableing of VMware tools and using NTP (or windows time service) for anyone who wishes to keep accurate time, or for anyone who wishes to avoid a clock too fast problem. The VMware tools option would be a nice one for time sync, if it worked correctly. I find that many Windows VM's have a reaonsalbe success rate with VMware tools, but about 10% do not. my 10% for windows is about 400 VM's, and that is an unacceptable amount of error.

Reply
0 Kudos
Erik_Zandboer
Expert
Expert

Hi,

Ever seen a really busy ESX environment without the VMware tools timesync? I have seen VMs running slow one hour every 24 hours. Especially because the clock slowing down is dependent on the ESX host CPU usage, I doubt that NTP is able to keep up is such environments. Speeding up the clock ends somewhere... If you need accurate timing under all possible circumstances, you might consider to do both: NTP will try to keep correct timing, if it fails and the VM runs behind more than one minute, VMware Tools will "brutally" fix the issue. Best of both worlds in my opinion.

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
Craig_Baltzer
Expert
Expert

The two possible "best practices" from the "Best Practices for Virtualizing AD" presentation at VMWorld 2008 were:

  1. Set the "top" of the Windows time hierarchy (the PDC emulator FSMO role holder in the forest root domain) to use an authoritative source of external time. Allow the remaining DCs to use the Windows time hierarchy. This is the standard MS recommendation for time service configuration. By default this will result in time syncrhonization approximately every hour to the external time source (can be changed by setting SpecialPollInterval)

  2. Set Windows Time Service on all servers to "NoSync" in the registry, and enable time sync in the VMware tools.

They were presented as "mutually exclusive" and you shouldn't mix both approaches.

One of the examples in the session was a heavily loaded ESX system running the PDC FSMO role holder with no syncrhonization to an external source of time. Over an 18 hour period it lost 28 minutes, or around 1.5 minutes per hour. If you choose option #1 the PDC would get time from the external source once per hour (i.e. the default "SpecialPollInterval" setting for w32time is 3600 seconds). That would cause some "catching up" of a minute or two every hour when the ESX box was heavily loaded. It's likely not significant enough to cause Kerberos issues but it will lead to some inaccurate logging. For uniprocessor DCs the descheduled time accounting feature of the VMware tools is supposed to help avoid this "losing time" on heavily loaded systems, but right now its still "experimental" and only works on uniprocessor VMs.

A possible problem could arise if the load of the ESX box varies over time. w32time will synchronize less frequently after it detects things are in sync (drops to once every 8 hours between DCs). This could be problematic if an ESX box experiences say a busy 4-5 hour period after a low activity period. It's possilbe that the DCs within the domain would not be updating for 8 hours, and thus lose in excess of 5 minutes during the high load period causing Kerberos issues.

Setting time.synchronize.resume.disk to false could really play havoc with time under certain circumstances. The obvious event that would cause problems would be a "suspend" of the VM or a slow vmotion...

As far as the VMware tools only moving time forward IMO that's not a bad thing at all; nothing more deadly in AD than having time going backwards...

Reply
0 Kudos
JeffHarrison
Contributor
Contributor

Just as a follow-up... I have configured my domain controller to get it's time from the ESX hosts through VMWare tools as recommended by VMware. Today I am working on that domain controller and the time is 1 hour off... that can't be good. Checked the time on the ESX hosts and it is correct. Looks like there may be some issues with doing it this way.

Reply
0 Kudos
wilson94t
Enthusiast
Enthusiast

Configuring vmware tools to keep your time sensitive applications or infrastructure services such as active directory up to date is really a bad idea. VMware claim it works, buit it does nothing other than forward the time on your VM after it gets 60seconds behind. While that may work for some, it's not really acceptable for this type of application.

In your Windows VM, Disable VMware tools time sync! Use Windows Time Service. Check to make sure that you are getting good time with the following:

w32tm /monitor /computers:timesource.domain.tld.

check that you are syncing with your other domain controller with:

w32tm /monitor

You can also adjust your ntp polling rate if you need to increase your resolution.

Reply
0 Kudos
wilson94t
Enthusiast
Enthusiast

Erik: "Ever seen a really busy ESX environment without the VMware tools timesync?"

Erik, I'm not sure what you mean by "really busy". We run our ESX servers at ratios of 20:1 and beyond, using up to 70% memory averages and try to keep CPU averages below 40%, but do not alway succeed. (Note that these are average values)

If you use the VMTools time sync to "brutally" fix the solution, you must also periodically reset your statistics. Each time you do this you will prevent the ability for NTP to do it's job properly. NTP requires that a regular update of time stats be collected and then assumptions are made based on those stats. If you jump ahead 1 minute, you're going to skew things.

If you are continually having time problems, check for the latest bios too. Our HP servers had an update which seemed help with this, and time became a bit more stable.

Reply
0 Kudos