[WIP | First Contribution] add first pass at TimeSeriesInitialSplit #23931

alexhallam · 2022-07-16T21:12:28Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Add functionality to do time series cross validation with an initial window.

Any other comments?

I have added the code, but I would like to add tests as well. I am just having difficulty setting up the test environment.

thomasjpfan · 2022-07-20T15:06:07Z

sklearn/model_selection/_split.py

+    TRAIN: [0] TEST: [1]
+    TRAIN: [0 1] TEST: [2]
+    TRAIN: [0 1 2] TEST: [3]
+    TRAIN: [0 1 2 3] TEST: [4]
+    TRAIN: [0 1 2 3 4] TEST: [5]


I do not understand this API with initial=21, what is the initial=21 doing here?

Also the default test size is 7 and there is only one sample in the test set.

Thanks for the response. I should explain this better.

Assume the blue dots represents dates to train on and the red dots are for testing.

In this picture the initial size of the blue dots are much larger than the size of the testing dots. (6 blue dots vs 1 red). With each iteration train size increases by 1 blue dot.

My PR allows this kind of flexibility. A user may set a larger initial training period, which is not bound by the size that the training set is increases by on each iteration. It decouples the initial size from the increment size.

Oh and 21 represents 7 days a week times 3 (a three week training period by default). I am assuming people could use this function in a forecast setting for daily forecasts.

I am assuming people could use this function in a forecast setting for daily forecasts.

This assumption is not always valid. I've seen the time series splitter used for hourly data, etc.

In that case with initial=21, should the first split have training indices [0, 1, 2, 3, 4, ..., 20], and the test set be [21, 22, .., 27] because test_size=7?

It looks like you want a walk forward as in #23780 but with an additional min_train_size.

That is fair enough. I am not married to initial=21. You let me know what is preferred and I will make the change. The functionality is much more important to me than the initial args.

It looks like you want a walk forward as in #23780 but with an additional min_train_size.

I am afraid that min_train_size means that I want a consistent train size. That is what I do not want. I want one train size for the initial training then a second train size for each iteration.

If you sign off on this then I can keep working. I remember seeing a contributors document. I would have to see what the expectation is for writing tests and documentation.

I am not signing off on this because the API is not finalized and unclear. Let's move the discussion to the issue.

After the API is clear, we can evaluate if we want this in scikit-learn.

glemaitre · 2024-03-13T10:28:51Z

Closing this PR such that we have a clear API decision before to go forward.

add first pass at TimeSeriesInitialSplit

1aa6181

github-actions bot added the module:model_selection label Jul 16, 2022

thomasjpfan reviewed Jul 20, 2022

View reviewed changes

thomasjpfan mentioned this pull request Jul 20, 2022

TimeSeriesSplit Needs a version that allows users to specify initial time #23923

Open

Update _split.py

aa594bf

glemaitre closed this Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP | First Contribution] add first pass at TimeSeriesInitialSplit #23931

[WIP | First Contribution] add first pass at TimeSeriesInitialSplit #23931

Uh oh!

alexhallam commented Jul 16, 2022

Uh oh!

thomasjpfan Jul 20, 2022 •

edited

Loading

Uh oh!

alexhallam Jul 20, 2022

Uh oh!

alexhallam Jul 20, 2022 •

edited

Loading

Uh oh!

thomasjpfan Jul 20, 2022

Uh oh!

alexhallam Jul 20, 2022

Uh oh!

alexhallam Jul 20, 2022

Uh oh!

alexhallam Jul 20, 2022 •

edited

Loading

Uh oh!

thomasjpfan Jul 20, 2022 •

edited

Loading

Uh oh!

glemaitre commented Mar 13, 2024

Uh oh!

Uh oh!

Uh oh!

[WIP | First Contribution] add first pass at TimeSeriesInitialSplit #23931

[WIP | First Contribution] add first pass at TimeSeriesInitialSplit #23931

Uh oh!

Conversation

alexhallam commented Jul 16, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

thomasjpfan Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexhallam Jul 20, 2022

Choose a reason for hiding this comment

Uh oh!

alexhallam Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jul 20, 2022

Choose a reason for hiding this comment

Uh oh!

alexhallam Jul 20, 2022

Choose a reason for hiding this comment

Uh oh!

alexhallam Jul 20, 2022

Choose a reason for hiding this comment

Uh oh!

alexhallam Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Mar 13, 2024

Uh oh!

Uh oh!

thomasjpfan Jul 20, 2022 •

edited

Loading

alexhallam Jul 20, 2022 •

edited

Loading

alexhallam Jul 20, 2022 •

edited

Loading

thomasjpfan Jul 20, 2022 •

edited

Loading