Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ignoring sessions that have no data #1035

Merged
merged 13 commits into from
Jan 24, 2025
25 changes: 19 additions & 6 deletions mne_bids_pipeline/_config_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,14 +142,27 @@ def _get_sessions(config: SimpleNamespace) -> tuple[str, ...]:
return tuple(str(x) for x in sessions)


def get_subjects_sessions(config: SimpleNamespace) -> dict[str, list[str] | list[None]]:
subj_sessions: dict[str, list[str] | list[None]] = dict()
def get_subjects_sessions(config: SimpleNamespace) -> dict[str, list[str]]:
subj_sessions: dict[str, list[str]] = dict()
cfg_sessions = _get_sessions(config)
# find which tasks to ignore when deciding if a subj has data for a session
if config.task is None:
ignore_tasks = None
else:
all_tasks = _get_entity_vals_cached(
root=config.bids_root,
entity_key="task",
ignore_datatypes=_get_ignore_datatypes(config),
)
ignore_tasks = tuple(set(all_tasks) - set([config.task]))
for subject in get_subjects(config):
# Only traverse through the current subject's directory
# Check current subject's directory for available sessions
valid_sessions_subj = _get_entity_vals_cached(
config.bids_root / f"sub-{subject}",
entity_key="session",
ignore_tasks=ignore_tasks,
ignore_acquisitions=("calibration", "crosstalk"),
ignore_suffixes=("scans", "coordsystem"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are the important bit, for getting MNE-BIDS to only tell us if data files are present.
This ignore_suffixes line requires upstream mne-tools/mne-bids@41ce36a to work, so we'll need to pin MNE-BIDS to that commit, or wait for the next MNE-BIDS release before merging here.

I'm also not really sure if coordsystems.json and scans.tsv are the only possible sidecars that could have the undesirous effect (of having get_entity_vals return a false positive when no data files are actually present for the requested session --- cf. discussion at mne-tools/mne-bids#1351). If you can think of other sidecar suffixes to exclude, speak now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we'll need to pin MNE-BIDS to that commit, or wait for the next MNE-BIDS release before merging here.

This is a tough req since MNE-BIDS does not release super frequently. And in the meantime anyone doing pip install -ve . will have a valid usable (sufficiently up-to-date) editable install overwritten by the pinned install etc.

Since this is a bit of a niche bug (right?), could we instead use inspect to see if the right kwarg is present and pass it if so? Then once MNE-BIDS releases we bump our min req to the one that actually has it and remove the kwarg thing here. If we wait to release MNE-BIDS-Pipeline until MNE-BIDS releases then users are in good shape, and in the meantime devs are also in (sufficiently) good shape. We'd "just" need to make CIs install the dev version of MNE-BIDS, which they might already be doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... i.e., we don't have to wait to merge this PR and we also don't have to explicitly pin below in pyproject.toml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but if someone sets allow_missing_sessions (and actually is missing some sessions) and they don't have a new enough mne_bids, then mne_bids will report that they have data for a given subj+session that they don't actually have data for, and then the pipeline will bomb later.

Maybe I should use inspect here, but also check it during config import, to make sure that if they want to allow_missing_sessions, that they also have a sufficiently capable version of mne_bids?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would work, if they have it set and the kwarg doesn't exist then raise an error, or at least emit a warning.

Should be a transient problem anyway since hopefully we'll release after MNE-BIDS, and then pin in pyproject.toml to >= a released version

ignore_datatypes=_get_ignore_datatypes(config),
)
missing_sessions = set(cfg_sessions) - set(valid_sessions_subj)
Expand All @@ -159,9 +172,9 @@ def get_subjects_sessions(config: SimpleNamespace) -> dict[str, list[str] | list
f"{tuple(sorted(missing_sessions))}, and "
"`config.allow_missing_sessions` is False"
)
subj_sessions[subject] = sorted(set(cfg_sessions) & set(valid_sessions_subj))
if subj_sessions[subject] == []:
subj_sessions[subject] = [None]
Copy link
Member Author

@drammock drammock Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wasn't doing what was intended. We want [None] (or (None,)) for datasets where sessions aren't tracked (ie there's just one session). But for datasets where sessions are tracked, we should omit an entry from this subj_sessions dict if there aren't any sessions found for that subj.

keep_sessions = sorted(set(cfg_sessions) & set(valid_sessions_subj))
if len(keep_sessions):
subj_sessions[subject] = keep_sessions
return subj_sessions


Expand Down
Loading