Normalize pyro-dataset imports via dvc import#32
Merged
Conversation
Replace S3-only .dvc files with dvc import from pyro-dataset v2.2.0
in tracking-fsm-baseline and mtb-change-detection. Full dataset is
imported to datasets_full/, then a truncate pipeline stage produces
the 20-frame-max datasets/ that downstream stages consume.
- Add "Dataset Import" section to experiments/GUIDELINES.md
- Update experiment template with dvc import instructions and
truncate stage scaffold
- tracking-fsm-baseline: add truncate stage, update list_sequences
and is_wf_sequence for nested {wildfire,fp}/ layout, fix
find_sequence_dir in track/sweep/ablation/fiftyone scripts,
fix float class_id parsing in load_label_boxes
- mtb-change-detection: add truncate stage, replace prepare_dataset.py
with truncate_sequences.py
- Ground truth is now inferred from parent directory name, not label format - Update tracking-fsm-baseline sequence counts to match v2.2.0 dataset
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dvc importfromhttps://github.com/pyronear/pyro-datasetwith a pinned version tag (v2.2.0).dvcfiles to properdvc importwith atruncatepipeline stage for the 20-frame-max preprocessingexperiments/GUIDELINES.mdand update the experiment template with import instructions and a truncate stage scaffoldtracking-fsm-baseline changes
truncatestage todvc.yaml(import full dataset todatasets_full/, truncate todatasets/)list_sequencesto handle nested{wildfire,fp}/layout (matching pyro-detector-baseline)is_wf_sequenceto use parent directory name instead of label format heuristicfind_sequence_dirhelper, used intrack.py,sweep.py,ablation.py,build_fiftyone_dataset.pyload_label_boxesfloat class_id parsing (int(float(...)))is_wf_sequence, add tests forfind_sequence_dirandlist_sequencesmtb-change-detection changes
truncatestage todvc.yamlprepare_dataset.pywithtruncate_sequences.py(single-split interface for DVCforeach)Metrics comparison
Metrics are consistent with baseline (slight differences from complete v2.2.0 dataset having a few more sequences).
tracking-fsm-baseline val/pyronear: P=0.926 R=0.966 F1=0.945 FPR=0.060
mtb-change-detection val/pyronear: P=0.682 R=0.922 F1=0.784 FPR=0.331 (exact match)
Test plan
make lint+make formatpass for both experimentsmake testpasses (106/106 tracking-fsm, 88/88 mtb)uv run dvc repro evaluateproduces correct metrics for both experimentsmake fiftyonebuilds datasets successfully (tracking-fsm-baseline).dvcimport files havefrozen: trueandrepo.rev: v2.2.0