[QUESTION] Backtesting Using Global ML Models #2613

jrodenbergrheem · 2024-12-07T17:13:46Z

My question: when backtesting, how is it that my model fits in 2 minutes using .fit() but when I call historical forecasts, in which my model should make 4 fits it seems it could take 1-2 hours to complete??

I notice when using .historical_forecasts and printing the progress it seems to be generating forecasts for one series at a time... Is there a way to backtest globally? I wasn't under the impression historical forecast would have to refit the model for each series given I am using a global model.

Here is my code, data_transformed is a list of timeseries objects that I created using from_group_dataframe

ml_backtested = catboost_model.historical_forecasts(series = data_transformed,
start=start_point,
stride=horizon,
forecast_horizon=horizon,
show_warnings=False,
last_points_only=False,
verbose=True
)

jrodenbergrheem · 2024-12-07T17:29:55Z

A followup question: I have found that if I fit the model, then call historical forecasts with retrain = False it seems to product these forecasts quite quickly. How is this the case? And are these historical forecasts true backtested forecasts (no data leakage of future values?) Is this the way to backtest ML models globally in darts, to first fit then call historical forecasts with retrain == False?

dennisbader · 2024-12-08T10:48:12Z

Hi @jrodenbergrheem, we have updated the historical forecast documentation a while ago and it'll be released in the next few weeks. Here is the updated description, which I hope should answer your questions :)

"""Generates historical forecasts by simulating predictions at various points in time throughout the history of
the provided (potentially multiple) `series`. This process involves retrospectively applying the model to
different time steps, as if the forecasts were made in real-time at those specific moments. This allows for an
evaluation of the model's performance over the entire duration of the series, providing insights into its
predictive accuracy and robustness across different historical periods.

There are two main modes for this method:

- Re-training Mode (Default, `retrain=True`): The model is re-trained at each step of the simulation, and
  generates a forecast using the updated model.
- Pre-trained Mode (`retrain=False`): The forecasts are generated at each step of the simulation without
  re-training. It is only supported for pre-trained global forecasting models. This mode is significantly
  faster as it skips the re-training step.

By choosing the appropriate mode, you can balance between computational efficiency and the need for up-to-date
model training.

**Re-training Mode:** This mode repeatedly builds a training set by either expanding from the beginning of
the `series` or by using a fixed-length `train_length` (the start point can also be configured with `start`
and `start_format`). The model is then trained on this training set, and a forecast of length `forecast_horizon`
is generated. Subsequently, the end of the training set is moved forward by `stride` time steps, and the process
is repeated.

**Pre-trained Mode:** This mode is only supported for pre-trained global forecasting models. It uses the same
simulation steps as in the *Re-training Mode* (ignoring `train_length`), but generates the forecasts directly
without re-training.

By default, with `last_points_only=True`, this method returns a single time series (or a sequence of time
series) composed of the last point from each historical forecast. This time series will thus have a frequency of
`series.freq * stride`.
If `last_points_only=False`, it will instead return a list (or a sequence of lists) of the full historical
forecast series each with frequency `series.freq`.
"""

And regarding your question:

it's so much faster because it is not re-retrained, and we have implemented optimized historical forecast routines for all our global models (global naive baselines, regression, and torch models).
- we generate the forecasts for all series at once (or batches of them).
- there is no data leakage, we extract the same input / output windows as in the re-train mode.
currently, when setting retrain=True in historical forecasts, the models will be re-trained on each time series separately. The global training mode is not yet supported (see the ongoing thread [FEATURE] Add support for global training in historical forecasts, backtest, residuals #1538).

And yes, backtesting without re-training is a good idea:

You want your model to generalize well even over time.
If you look at the historical forecast errors over time (backtest, or also residuals), and they are "constant" over time, then you don't need to re-train the model.
If your model performance starts shifting away from the expected, then it's time to re-train.

jrodenbergrheem · 2024-12-08T20:26:17Z

Interesting, so global backtesting (with refit) is not currently implemented from what I understand, got it. That explains the long training time.

But if I fit my series to all data and backtest on all data without refit, I am not sure I understand how it is not leaking data...

Essentially under the hood is it still fit/predict for each input/output window?? If so then I don't quite grasp why a retrain mode is needed; if the result of historical forecast with refit == False is the same as multiple fit/predict calls on different slices of data that is what I am after.

Appreciate the response and the work you do,

Jack

dennisbader · 2024-12-08T20:52:46Z

Hi @jrodenbergrheem. Yes, if you use the same series that you trained your model on then you would have data leakage.

You can look at historical_forecasts(), backtest() and residuals() as evaluation methods.
So you would compute them on series or time frames (after the training window) that have not been used to train the model.
Since the models are trained globally, they can be used to forecast any (new) input series.

The re-train mode is required for example for local models which must be re-trained in order to forecast at each step of the historical simulation.

Hope it helps. Let me know if you need more information.

jrodenbergrheem added question Further information is requested triage Issue waiting for triaging labels Dec 7, 2024

madtoinou removed the triage Issue waiting for triaging label Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Backtesting Using Global ML Models #2613

[QUESTION] Backtesting Using Global ML Models #2613

jrodenbergrheem commented Dec 7, 2024

jrodenbergrheem commented Dec 7, 2024

dennisbader commented Dec 8, 2024 •

edited

Loading

jrodenbergrheem commented Dec 8, 2024 •

edited

Loading

dennisbader commented Dec 8, 2024

[QUESTION] Backtesting Using Global ML Models #2613

[QUESTION] Backtesting Using Global ML Models #2613

Comments

jrodenbergrheem commented Dec 7, 2024

jrodenbergrheem commented Dec 7, 2024

dennisbader commented Dec 8, 2024 • edited Loading

jrodenbergrheem commented Dec 8, 2024 • edited Loading

dennisbader commented Dec 8, 2024

dennisbader commented Dec 8, 2024 •

edited

Loading

jrodenbergrheem commented Dec 8, 2024 •

edited

Loading