Skip to content

Normalize pyro-dataset imports via dvc import#32

Merged
Chouffe merged 2 commits intomainfrom
arthur/normalize-pyro-dataset-imports
Apr 1, 2026
Merged

Normalize pyro-dataset imports via dvc import#32
Chouffe merged 2 commits intomainfrom
arthur/normalize-pyro-dataset-imports

Conversation

@Chouffe
Copy link
Copy Markdown
Collaborator

@Chouffe Chouffe commented Apr 1, 2026

Summary

  • Standardize how experiments import the pyro-dataset: always use dvc import from https://github.com/pyronear/pyro-dataset with a pinned version tag (v2.2.0)
  • Convert tracking-fsm-baseline and mtb-change-detection from S3-only .dvc files to proper dvc import with a truncate pipeline stage for the 20-frame-max preprocessing
  • Add "Dataset Import" section to experiments/GUIDELINES.md and update the experiment template with import instructions and a truncate stage scaffold

tracking-fsm-baseline changes

  • Add truncate stage to dvc.yaml (import full dataset to datasets_full/, truncate to datasets/)
  • Update list_sequences to handle nested {wildfire,fp}/ layout (matching pyro-detector-baseline)
  • Update is_wf_sequence to use parent directory name instead of label format heuristic
  • Add find_sequence_dir helper, used in track.py, sweep.py, ablation.py, build_fiftyone_dataset.py
  • Fix load_label_boxes float class_id parsing (int(float(...)))
  • Update tests for new is_wf_sequence, add tests for find_sequence_dir and list_sequences

mtb-change-detection changes

  • Add truncate stage to dvc.yaml
  • Replace prepare_dataset.py with truncate_sequences.py (single-split interface for DVC foreach)

Metrics comparison

Metrics are consistent with baseline (slight differences from complete v2.2.0 dataset having a few more sequences).

tracking-fsm-baseline val/pyronear: P=0.926 R=0.966 F1=0.945 FPR=0.060
mtb-change-detection val/pyronear: P=0.682 R=0.922 F1=0.784 FPR=0.331 (exact match)

Test plan

  • make lint + make format pass for both experiments
  • make test passes (106/106 tracking-fsm, 88/88 mtb)
  • uv run dvc repro evaluate produces correct metrics for both experiments
  • make fiftyone builds datasets successfully (tracking-fsm-baseline)
  • New .dvc import files have frozen: true and repo.rev: v2.2.0

Chouffe added 2 commits April 1, 2026 18:48
Replace S3-only .dvc files with dvc import from pyro-dataset v2.2.0
in tracking-fsm-baseline and mtb-change-detection. Full dataset is
imported to datasets_full/, then a truncate pipeline stage produces
the 20-frame-max datasets/ that downstream stages consume.

- Add "Dataset Import" section to experiments/GUIDELINES.md
- Update experiment template with dvc import instructions and
  truncate stage scaffold
- tracking-fsm-baseline: add truncate stage, update list_sequences
  and is_wf_sequence for nested {wildfire,fp}/ layout, fix
  find_sequence_dir in track/sweep/ablation/fiftyone scripts,
  fix float class_id parsing in load_label_boxes
- mtb-change-detection: add truncate stage, replace prepare_dataset.py
  with truncate_sequences.py
- Ground truth is now inferred from parent directory name, not label format
- Update tracking-fsm-baseline sequence counts to match v2.2.0 dataset
@Chouffe Chouffe merged commit a6c851d into main Apr 1, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant