Skip to content

Commit 44396be

Browse files
committed
backtesting version un/faithful clarification
1 parent 5229654 commit 44396be

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

vignettes/backtesting.Rmd

+4-2
Original file line numberDiff line numberDiff line change
@@ -387,17 +387,19 @@ Now let's look at Florida.
387387
In the version faithful case, the three late-2021 forecasts (purples and pinks) starting in September predict very low values, near 0.
388388
The trend leading up to each forecast shows a substantial decrease, so these forecasts seem appropriate and we would expect them to score fairly well on various performance metrics when compared to the versioned data.
389389

390-
In hindsight, we know that early versions of the data systematically under-reported COVID-related doctor visits such that these forecasts don't actually perform well compared to _finalized_ data.
390+
However in hindsight, we know that early versions of the data systematically under-reported COVID-related doctor visits such that these forecasts don't actually perform well compared to _finalized_ data.
391391
In this example, version faithful forecasts predicted values at or near 0 while finalized data shows values in the 5-10 range.
392392
As a result, the version un-faithful forecasts for these same dates are quite a bit higher, and would perform well when scored using the finalized data and poorly with versioned data.
393393

394394
In general, the longer ago a forecast was made, the worse its performance is compared to finalized data. Finalized data accumulates revisions over time that make it deviate more and more from the non-finalized data a model was trained on.
395-
Forecasts trained solely on finalized data will of course appear to perform better when scored on finalized data, but will have unknown performance on the non-finalized data we need to use if we want timely predictions.
395+
Forecasts _trained_ on finalized data will of course appear to perform better when _scored_ on finalized data, but will have unknown performance on the non-finalized data we need to use if we want timely predictions.
396396

397397
Without using data that would have been available on the actual forecast date,
398398
you have little insight into what level of performance you
399399
can expect in practice.
400400

401+
Good performance of a version un-faithful model is a mirage; it is only achievable if the training data has no revisions.
402+
If a data source has any revisions, version un-faithful-level performance is unachievable when making forecasts in real time.
401403

402404

403405
[^1]: For forecasting a single day like this, we could have actually just used

0 commit comments

Comments
 (0)