Solved: Redhat 4 Guest OS Clock running slow

choffman1 · ‎07-13-2007

Ok, I know this has probably been discussed before, but after reading the KB's and calling VMWare support I have not any closer to an answer.

I am running Redhat 4 update 5 Kernel 2.6.9-55.02 ELsmp 64bit on ESX 3.0.1 with all current patches applied.

The ESX is running on a Sun x4100 with two dual core AMD 275 cpu's.

ESX server is running NTP and has no problem keeping time. All of my debian based linux guess OS's have not had any trouble with time issues either. They are all running NTP.

The Redhat 4 Guest OS has 4 cpu's assigned to it. It has VMWare tools installed. The clock on the system continues to run slower than normal clock speed. I have tried running NTP on the Redhat Guest OS to see if that would just keep it in sync, but even then it runs slower than normal.

I have looked at KB 1420, 1518, and 1591 but feel that these really do not fix my issue.

The only thing I found useful was turning on Time synchronization between the virtual machine and the host operating system in VMWare Tools and turning NTP off.

My question is, Why does the clock lose time even when NTP is running. I know that there could be a number of reason why the clock loses speed when not running NTP, such has HZ different, or cpu throttling issues, but I thought NTP was designed to accelerate and decelerate the clock to stay close to NTP time.

Next question, Lets say running the Time sync on VMWare tools fixes my problem, how does this work? I mean to me it sounded like it just checked the time every so often and just updated it.

This seems like this could be a bad thing if I rely on timestamps.

What is the best solution, and what are current VMWare users doing to fix their time sync problems?

Thanks in advance,

Curtis

tsightler · ‎07-17-2007

Have you at least tried the fix for this as posted in KB2219? This is basically the same issue that happened with ESX 2.x and RHEL4 and we've had great success with lowering the Misc.TimerMinHardPeriod to values as low as 100. This is especially true if you have a system with 4 VCpu's because typically a Linux system appears to need at least 1000 interrupts per CPU. With 4 CPU's that means you need 4000 interrupts/sec but the default settings on ESX 3 seem to provide, at most, 2500 interrupts/sec.

We've changed this setting to 200 (max of 5000 interrupts/sec) or even 125 (max of 8000 interrupts/sec) on our servers and found that it can improve timekeeping on RHEL4 guests tremendously.

We run about a dozen RHEL4 servers on IBM hardware, a mix of 32 and 64-bit, and with these changes we get very accurate timekeeping in the guest, accurate enough that we can run ntpd in the guests and it rarely requires step adjustments, and when it does it's usually only a few tenths of a second, which is good enough for our requirements.

The only exception is when the hosts is under significantly disk load, then we seem to loose interrupts, and thus time, no matter what. Still usually not much, but some. For example, one of our hosts which gets significant loads at night has accumulated about a second worth of adjustments in the last week, a few tenths of a second each night during the heavy disk processing.

Anyway, if you haven't already tried it, it's probably worth trying to see if it helps with your case.

Later,

Tom

View solution in original post

Texiwill · ‎07-13-2007

Hello,

NTP will only update the time in 10s increments so if there is a massive clock skew (like you are seeing) it will never catch up. Read through http://www.ntp.org for more details.

To solve this problem I usually either use vmware tools or run ntpdate within the VM every 5 minutes. That usually works very well for me.

If the vmdesched driver is available for your version of the OS, which it should be you can use it to help with the clock cycle problem that affects the NTP daemon.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

choffman1 · ‎07-13-2007

Looks like vmware-vmdesched is not available. There is a symbolic link but the nothing in the /usr/lib/vmware-tools/sbin64/vmware-vmdesched

Guess my real question is, will running the vmware tools time sync cause my timestamps to be funky since it really just resets the clock so often and will have time holes or overlaps in it?

UPDATE!!

Looks like even with the vmware tools time sync enable, the guest OS will be go all the way to 2 minutes slow, then be re-set when I guess the vmware toos time syncs. This is just not a realistic solution for business use that depend on timestamps.

What the heck? Gotta be a fix for this other than having to run time update every X seconds.

beckmana · ‎07-16-2007

Hi,

add the clock=pit boot parameter and enable vmware tools time sync

Andreas

Texiwill · ‎07-16-2007

Hello,

In order to get desched compiled you need to use the 'vmware-config-tools.pl -e' option to get the experimental drivers as that is what this is.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

choffman1 · ‎07-16-2007

pit is for 32 bit, notsc would be for 64 bit.

And I have tried using that and made no difference, but that option is generally used if clock is running too fast.

Also, I tried the experimental features on vmware tools but it said their was none available for my kernel.

UPDATE \----

After getting a senior technician from VMWare, he told me that pretty much there is no real fix besides running 2.6.18 or greater kernel. You can constantly run NTP on a frequent basis, but this is not an idea solution. The VMWare tools time sync does work, however it basically just like running ntpdate every so often so will have time holes in it.

VMWare says they are working on a fix, but I think the real fix is to just wait for Redhat to officially release 2.6.18 kernel for redhat4. I have an issue open with Redhat to see when this will occur and if there is any workaround for the 2.6.9 kernel now.

Message was edited by:

choffman1

Texiwill · ‎07-16-2007

Hello,

The RedHat4 kernel will always be a variant of 2.6.9. If you want 2.6.18 you will need to upgrade to RHEL5. That is just the way RedHat works when there is a new release of their OS. RHEL4 is entering 'critical fix only' mode now. I do not expect that to change.

I use 'ntpdate' as my solution and there are no major time holes in my system when it runs. I think it is the only solution currently, granted it is not elegant but it does work very well.

You could run Fedora7 or FC6 to get your 2.6.18 or .20 kernel but then you run the risk of needing patches to the VMwareTools just to have them compile. There is no patch for some of the tools yet.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

formulator · ‎07-16-2007

I also use ntpdate in cron every 5 minutes on a handful of PRD RHEL4 VMs, been doing so since ESX 2.5.x.

choffman1 · ‎07-17-2007

You will still have time holes though. Ntpdate sets the time, so basically whatever the offset was you just made a hole. At least with ntpd it will be closer to accurate since it just playing catch up or slow down.

Basically, dont run Redhat4 on VMWare if your server is very timestamp crucial.

Texiwill · ‎07-17-2007

Hello,

Well I have been looking through the VMware Tools for the vmdesched and for Linux systems it will not load as an experimental driver unless you are NOT using an SMP kernel. You can however compile it yourself with a few small changes and use it. I am working on patches at the moment for later kernels.

Using ntpd when the clock skew is consistently > 10s will NEVER catch up. There is a great article discussing this at http://www.ntp.org. Hence why ntpdate is the better solution for the two.

But I agree, until there is a vmdesched that works, timestamp crucial VMs should not use RedHat4.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

tsightler · ‎07-17-2007

Have you at least tried the fix for this as posted in KB2219? This is basically the same issue that happened with ESX 2.x and RHEL4 and we've had great success with lowering the Misc.TimerMinHardPeriod to values as low as 100. This is especially true if you have a system with 4 VCpu's because typically a Linux system appears to need at least 1000 interrupts per CPU. With 4 CPU's that means you need 4000 interrupts/sec but the default settings on ESX 3 seem to provide, at most, 2500 interrupts/sec.

We've changed this setting to 200 (max of 5000 interrupts/sec) or even 125 (max of 8000 interrupts/sec) on our servers and found that it can improve timekeeping on RHEL4 guests tremendously.

We run about a dozen RHEL4 servers on IBM hardware, a mix of 32 and 64-bit, and with these changes we get very accurate timekeeping in the guest, accurate enough that we can run ntpd in the guests and it rarely requires step adjustments, and when it does it's usually only a few tenths of a second, which is good enough for our requirements.

The only exception is when the hosts is under significantly disk load, then we seem to loose interrupts, and thus time, no matter what. Still usually not much, but some. For example, one of our hosts which gets significant loads at night has accumulated about a second worth of adjustments in the last week, a few tenths of a second each night during the heavy disk processing.

Anyway, if you haven't already tried it, it's probably worth trying to see if it helps with your case.

Later,

Tom

trothen · ‎11-25-2007

If you are using a Windows host and RHEL4/CentOS4 as the Guest, you can use the following workaround, this assumes that your host is syncing it's time from an authoritative source and prevents you from having to abuse a public NTP server with constant requests from the Linux virtual machine.

Make sure your Windows Host is syncing its clock
with an internet time server (Hint: Date and Time control panel>Internet
Time)
Download rfc868time-1.3.exe from
to a Windows box that you intend to act as an authoritative time source for
your linux boxes.
Open a command prompt on the Windows host that
will run the rfc868time server and do the following.

c:

cd "program files"

mkdir rfc8686time

copy c:\some\path\rfc868time-1.3.exe c:\program files\rfc8686time

cd \program files\rfc868time

rfc868time-1.3.exe -install

Make an exception in the windows firewall to
allow inbound traffic to TCP port 37 from the hosts or subnet that will be
sending requests to the windows server.

Windows Firewall exception details:

Name: rdate-tcp-37

Port number: 37

Protocol: TCP

Scope (example for subnet): 192.168.58.0/255.255.255.0

Start the rfc868time service

net start rfc868time

From the linux client, test getting the time
from the Windows host. Login to the linux box as root via ssh. Replace myhost
with the hostname of the windows box hosting the rfc868time service. Note this doesn't actually change the time, if successful it will only show the time differential betweeen the Guest OS and the Host OS.

rdate myhost

Run the following commands to create as script
that will be used to sync the time.

mkdir -p /root/jobs

touch /root/jobs/timesync.sh

echo '#!/bin/bash' >> /root/jobs/timesync.sh

echo 'rdate -s -t 15 myhost >/dev/null 2>&1' >> /root/jobs/timesync.sh

chmod -R 750 /root/jobs

Open the crontab editor as root.

crontab -e

In the resulting window, press i to enter insert mode in the crontab entry
for root. Enter the following text, press escape,
type wq, press enter.

*/5 * * * * /root/jobs/timesync.sh

You should get a message that the system is
installing a new crontab.

<!--[if gte vml 1]>

<![endif]-->

Done! Your time should now be syncing with your
windows host every 5 minutes. So long as your windows box is keeping its time
in sync with an authoritative source, your linux box should keep its time as
well.