-
Notifications
You must be signed in to change notification settings - Fork 25
Description
chrony_tracking_root_dispersion_seconds tracks the root dispersion, which on a default Chrony instance can have wildly inaccurate values. The root dispersion includes an assumption of at least 1 PPM in local clock drift (see the Chrony FAQ). Since Chrony defaults to polling NTP servers every 2^10 seconds (1024s, or ~17 minutes), that 1 PPM drift appears as a sawtooth wave in chrony_tracking_root_dispersion_seconds ranging from ~0 (at poll time) to 1.024ms (just before the next poll).
That's a lot of error if you have a local NTP source that you're syncing to.
I've been trying to improve local NTP accuracy, and root dispersion really threw me for a loop. Here's how the 3 factors listed in the README performed with a default config over 12h:
Using the formula from the README gave about a ~1ms P99 error rate for this system.
Looking at just the last-offset time gave a totally different view:
(The server from the first graph is purple in this one). It's showing less than a 30 microsecond delta every time Chrony changes its clock, vs a 1 ms error implied by the larger metric.
Chrony's own accuracy examples are using a variant of last-update for their metrics, while also comparing against a second clock. Where does the recommendation for root-delay + root-dispersion + last-offset actually come from? I can't find it searching the Chrony manpages on the link provided.
Background: I'm trying to get my clocks accurate enough for time skew not to be a problem with distributed traces. 1ms is too slow. 10 microseconds is perfectly fine. I spent quite a bit of time chasing errors and really just ended up learning a whole lot about root dispersion metrics. Which is fine, but I'd like to save others from making the same trip.

