-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Hi!! When running the tune_reconstruction task, the program encounters an IndexError caused by a shape mismatch between the mask and the dataset. This issue occurs inconsistently—while it works perfectly for smaller datasets, it fails when using larger datasets (e.g., 5000 or 10000 samples).
I’m not entirely sure what’s causing the error, but I’ve run several tests with different configurations and still encounter the same issue. I’ve included all details below to help with troubleshooting. Do you have any idea what could be causing this?
Thanks in advance for your help!
hydra:
mode: MULTIRUN
sweeper:
params:
task.batch_size: 10, 50
task.model.num_hidden: "[500],[1000]"
task.training_loop.num_epochs: 40, 60, 100
The Error is:
[INFO - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[1000];task.training_loop.num_epochs=100
Error executing job with overrides: ['task.batch_size=10', 'task.model.num_hidden=[500]', 'task.training_loop.num_epochs=40', 'experiment=gtex__tune_reconstruction_10000_samples']
Traceback (most recent call last):
File "/home/local/tools/anaconda3-24.10.1/bin/move-dl", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 465, in _run_app
run_and_report(
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
^^^^^^
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 466, in <lambda>
lambda: hydra.multirun(
^^^^^^^^^^^^^^^
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/hydra.py", line 162, in multirun
ret = sweeper.sweep(arguments=task_overrides)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 181, in sweep
_ = r.return_value
^^^^^^^^^^^^^^
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/__main__.py", line 38, in main
move.tasks.tune_model(config)
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/tasks/tune_model.py", line 250, in tune_model
_tune_reconstruction(task_config)
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/tasks/tune_model.py", line 170, in _tune_reconstruction
train_dataloader = make_dataloader(
^^^^^^^^^^^^^^^^
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/data/dataloaders.py", line 188, in make_dataloader
dataset = make_dataset(cat_list, con_list, mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/data/dataloaders.py", line 157, in make_dataset
con_all = con_all[mask]
~~~~~~~^^^^^^
IndexError: The shape of the mask [100] at index 0 does not match the shape of the indexed tensor [10000, 53336] at index 0
The log for the reconstruction:
[2025-01-28 20:22:24] [INFO - tune_model]: Beginning task: tune model reconstruction 1
[2025-01-28 20:22:24] [INFO - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[500];task.training_loop.num_epochs=40
[2025-01-28 20:22:24] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:28] [INFO - tune_model]: Beginning task: tune model reconstruction 2
[2025-01-28 20:22:28] [INFO - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[500];task.training_loop.num_epochs=60
[2025-01-28 20:22:28] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:32] [INFO - tune_model]: Beginning task: tune model reconstruction 3
[2025-01-28 20:22:32] [INFO - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[500];task.training_loop.num_epochs=100
[2025-01-28 20:22:32] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:36] [INFO - tune_model]: Beginning task: tune model reconstruction 4
[2025-01-28 20:22:36] [INFO - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[1000];task.training_loop.num_epochs=40
[2025-01-28 20:22:36] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:39] [INFO - tune_model]: Beginning task: tune model reconstruction 5
[2025-01-28 20:22:39] [INFO - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[1000];task.training_loop.num_epochs=60
[2025-01-28 20:22:39] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:43] [INFO - tune_model]: Beginning task: tune model reconstruction 6
[2025-01-28 20:22:43] [INFO - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[1000];task.training_loop.num_epochs=100
[2025-01-28 20:22:43] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:47] [INFO - tune_model]: Beginning task: tune model reconstruction 7
[2025-01-28 20:22:47] [INFO - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[500];task.training_loop.num_epochs=40
[2025-01-28 20:22:47] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:51] [INFO - tune_model]: Beginning task: tune model reconstruction 8
[2025-01-28 20:22:51] [INFO - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[500];task.training_loop.num_epochs=60
[2025-01-28 20:22:51] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:55] [INFO - tune_model]: Beginning task: tune model reconstruction 9
[2025-01-28 20:22:55] [INFO - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[500];task.training_loop.num_epochs=100
[2025-01-28 20:22:55] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:59] [INFO - tune_model]: Beginning task: tune model reconstruction 10
[2025-01-28 20:22:59] [INFO - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[1000];task.training_loop.num_epochs=40
[2025-01-28 20:22:59] [DEBUG - tune_model]: Reading data
[2025-01-28 20:23:03] [INFO - tune_model]: Beginning task: tune model reconstruction 11
[2025-01-28 20:23:03] [INFO - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[1000];task.training_loop.num_epochs=60
[2025-01-28 20:23:03] [DEBUG - tune_model]: Reading data
[2025-01-28 20:23:06] [INFO - tune_model]: Beginning task: tune model reconstruction 12
[2025-01-28 20:23:06] [INFO - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[1000];task.training_loop.num_epochs=100
[2025-01-28 20:23:06] [DEBUG - tune_model]: Reading data
Metadata
Metadata
Assignees
Labels
No labels