Hello,
I am having problems getting our three 3.5 U3 hosts to sync time with the firewall. The ESX hosts are consistantly running 4 minutes faster then the firewall. I have checked the time sync on our NetApp and it runs correctly in line with the firewall. There is also a simple setting in the NetApp for Maximum Skew. I cannot find why or how to correct this problem in the hosts.
Anyone seen this before?
Thanks,
-G
I've noticed sometimes you need one successful (manual) sync to get ntpd to work correctly. Try this:
service ntpd stop
ntpdate <ip.of.ntp.peer>
service ntpd start
That ntpdate command will manually sync the clock with your ntp peer/server.
Have you opened up the ESX firewall?
esxcfg-firewall -e ntpClient
So I take your firewall is acting as NTP time source correct?
try the following commands from the command line
type "ntpq" hit enter (This will put you in an interactive shel)
type "peer" which should show your NTP time source
type "rv" which should show NTP update statistics.
The source IP has been changed in the output.
ntpq> peer
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.1 .LOCL. 1 u 32 64 377 0.426 -223755 42.675
ntpq> rv
status=c011 sync_alarm, sync_unspec, 1 event, event_restart,
version="ntpd 4.1.2@1.892 Thu Feb 15 04:31:47 EST 2007 (1)",
processor="i686", system="Linux2.4.21-57.ELvmnix", leap=11, stratum=16,
precision=-17, rootdelay=0.000, rootdispersion=1020.540, peer=0,
refid=0.0.0.0, reftime=00000000.00000000 Thu, Feb 7 2036 1:28:16.000,
poll=4, clock=cd2c468c.8d18e325 Thu, Jan 29 2009 10:15:24.551, state=1,
offset=0.000, frequency=500.000, jitter=0.008, stability=0.000
I've noticed sometimes you need one successful (manual) sync to get ntpd to work correctly. Try this:
service ntpd stop
ntpdate <ip.of.ntp.peer>
service ntpd start
That ntpdate command will manually sync the clock with your ntp peer/server.
Yeah try a manual ntpd restart.
The offset returned by ntpq is about 4 minutes (223755 Milliseconds = 3.72925 Minutes).
If that does not work manually set your time on and check for drift.
Updated output after running the commands you mentioned.
ntpq> peer
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.1.1 .LOCL. 1 u 7 64 17 0.403 544.082 32.524
ntpq> rv
status=c011 sync_alarm, sync_unspec, 1 event, event_restart,
version="ntpd 4.1.2@1.892 Thu Feb 15 04:31:47 EST 2007 (1)",
processor="i686", system="Linux2.4.21-57.ELvmnix", leap=11, stratum=16,
precision=-17, rootdelay=0.000, rootdispersion=3.255, peer=0,
refid=0.0.0.0, reftime=00000000.00000000 Thu, Feb 7 2036 1:28:16.000,
poll=4, clock=cd2c5417.d0cccccc Thu, Jan 29 2009 11:13:11.815, state=1,
offset=0.000, frequency=500.000, jitter=0.008, stability=0.000
544 millisecond offset. Looks like you are good. Keep an eye out for drift.
Super. I am getting that hang of this now.
Is there a setting to tell the hosts to resync if the drift is greater then say 1 minute?
Also, I only made this change on hosts 1 & 2, and then noticed host 3 had already changed to the correct time...????
They should stay in sync without the need for a cron job or anything of the like.
As to your other point. You might want to look into the configuration of your Firewalls ntpd daemon. Does it use an upstream provider and if so what.
I tend to use us.pool.ntp.org as an external time source. Take a look at http://www.pool.ntp.org/en/use.html
Esentialy this provides a round robing of accurate external time sources.
Wanted to add an update:
Looks like the same problem is back. The time on the hosts has drifted out again by about 3 minutes. The NetApp filers are correctly syncing time to the firewall but the ESX hosts are drifting too far out.
Question: Is there a drift allowance setting? So say I only want the time to drift to a max of 30 seconds. Can this be done?
This is atypical behavior. Try setting your ESX hosts to use a different external time source. I know there may be security considerations but ntp should be keeping these systems synced with the firewall.
Try pool.ntp.org
I'll have to work with the security guys on that one...
In the mean time I made some changes to the 3 hosts ntp.conf files based on the timekeeping doc.
One of the hosts now is running on PST time according to the date command. The other two are correct on EST time.
I ran ntpdate a few times and restarted the ntpd service but no luck.
Ever seen this?
NTP won't correct your timezone. It will assume your timezone is correct, and you server will adjust the time, compensating for the timezone. So, your PST time should be 3 hours behind your EST time, but the time itself should be correct.
-KjB
I would be inclined towards looking at the external timesource as we have gone over the hosts ntp stuff pretty well. It might be the hosts but when I start getting down to the bone on a problem where there is another possible cause external to what I can do (You testing with another Time source) I tend not to waste to much effort until I can have that niggling doubt satisfied.
What does the drift file do?
Is this the same as "allowed skew"?
Post your ntp.conf. The esx drifts too much. Delete the drift file and restart ntpd. Do all hosts have the same issue?
tinker panic 0
restrict 127.0.0.1
restrict kod nomodify notrap noquery nopeer
server 10.1.2.1
driftfile /var/lib/ntp/drift
All the hosts do have the same issue.