Skip to content

[Suggestion] Purging and embargoing to deal with unintended data leaks in cross validation. #1589

Open
@cryptocoinserver

Description

@cryptocoinserver

These approaches are often used in financial ML. Can benefit a wide variety of ML tasks though.

In short: Adding a safety gap between the k-folds or train-, test- and validation splits.

These articles explain it in detail:

https://medium.com/mlearning-ai/why-k-fold-cross-validation-is-failing-in-finance-65c895e83fdf

https://blog.quantinsti.com/cross-validation-embargo-purging-combinatorial/

The Combinatorial Purged Cross Validation mentioned there (it is a little better explained here: https://towardsai.net/p/l/the-combinatorial-purged-cross-validation-method) helps creating more walk-forward paths that are purely out-of-sample for increased statistical significance. This was proposed by Marcos Lopez de Prado in the “Advances in financial machine learning”.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementA new improvement or feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions