Fix series length ordering for string[python] IDs in dataframe validation/conversion by dario-fumarola · Pull Request #470 · amazon-science/chronos-forecasting

dario-fumarola · 2026-02-26T14:55:29Z

Summary

Fixes #440 by making series-length extraction deterministic and aligned with row order, including when id_column uses pandas string[python] dtype.

Root cause

After sorting by (id_column, timestamp_column), the code used:
value_counts(sort=False).to_list()
to derive per-series lengths. For some ID dtypes (notably string[python]), this can produce an order that does not match contiguous row blocks, which then misaligns timestamp slicing and can trigger false frequency inference failures.

Changes

In validate_df_inputs, replaced:
- df[id_column].value_counts(sort=False).to_list()
- with df.groupby(id_column, sort=False).size().to_list()
Applied the same fix in convert_df_input_to_list_of_dicts_input when validate_inputs=False for consistency.
Added regression tests:
- test_validate_df_inputs_accepts_string_python_ids_with_unequal_lengths
- test_validate_df_inputs_has_consistent_metadata_for_object_and_string_python_ids
- test_convert_df_with_validate_inputs_false_handles_string_python_ids

Validation

pytest test/test_df_utils.py (36 passed)
mypy src test (no issues)

Compatibility

No public API changes. Behavior is unchanged except for correcting dtype-dependent ordering/misalignment.

Fix string[python] id ordering in dataframe frequency validation

e0ccdf5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix series length ordering for string[python] IDs in dataframe validation/conversion#470

Fix series length ordering for string[python] IDs in dataframe validation/conversion#470
dario-fumarola wants to merge 1 commit intoamazon-science:mainfrom
dario-fumarola:fix/issue-440-string-python-id-order

dario-fumarola commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dario-fumarola commented Feb 26, 2026

Summary

Root cause

Changes

Validation

Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant