You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to reproduce the pre-training code in musicbert.
cuda info:NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7
I used all the specific package versions given in the requirment.txt
2023-12-22 11:46:44 | INFO | fairseq.trainer | loaded checkpoint checkpoints/checkpoint_last_musicbert_small.pt (epoch 2 @ 14 updates)
2023-12-22 11:46:44 | INFO | fairseq.trainer | loading train data for epoch 2
2023-12-22 11:46:44 | INFO | fairseq.data.data_utils | loaded 3785 examples from: sub_data_bin/train
2023-12-22 11:46:44 | INFO | fairseq.tasks.masked_lm | loaded 3480 blocks from: sub_data_bin/train
2023-12-22 11:46:44 | INFO | fairseq.trainer | begin training epoch 2
Traceback (most recent call last):
File "", line 1, in
File "/home/fyf/.pyenv/versions/anaconda3-5.2.0/envs/musicbert/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/fyf/.pyenv/versions/anaconda3-5.2.0/envs/musicbert/lib/python3.6/multiprocessing/spawn.py", line 119, in _main
self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'fairseq_user_dir_48782'
Traceback (most recent call last):
File "", line 1, in
File "/home/fyf/.pyenv/versions/anaconda3-5.2.0/envs/musicbert/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/fyf/.pyenv/versions/anaconda3-5.2.0/envs/musicbert/lib/python3.6/multiprocessing/spawn.py", line 119, in _main
self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'fairseq_user_dir_79710'
The pre-training code worked well with single gpu, but when using distributed setting, the fairseq dataloader seems to have some problem, do you have any idea?
The text was updated successfully, but these errors were encountered:
I was trying to reproduce the pre-training code in musicbert.
cuda info:NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7
I used all the specific package versions given in the requirment.txt
2023-12-22 11:46:44 | INFO | fairseq.trainer | loaded checkpoint checkpoints/checkpoint_last_musicbert_small.pt (epoch 2 @ 14 updates)
2023-12-22 11:46:44 | INFO | fairseq.trainer | loading train data for epoch 2
2023-12-22 11:46:44 | INFO | fairseq.data.data_utils | loaded 3785 examples from: sub_data_bin/train
2023-12-22 11:46:44 | INFO | fairseq.tasks.masked_lm | loaded 3480 blocks from: sub_data_bin/train
2023-12-22 11:46:44 | INFO | fairseq.trainer | begin training epoch 2
Traceback (most recent call last):
File "", line 1, in
File "/home/fyf/.pyenv/versions/anaconda3-5.2.0/envs/musicbert/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/fyf/.pyenv/versions/anaconda3-5.2.0/envs/musicbert/lib/python3.6/multiprocessing/spawn.py", line 119, in _main
self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'fairseq_user_dir_48782'
Traceback (most recent call last):
File "", line 1, in
File "/home/fyf/.pyenv/versions/anaconda3-5.2.0/envs/musicbert/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/fyf/.pyenv/versions/anaconda3-5.2.0/envs/musicbert/lib/python3.6/multiprocessing/spawn.py", line 119, in _main
self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'fairseq_user_dir_79710'
The pre-training code worked well with single gpu, but when using distributed setting, the fairseq dataloader seems to have some problem, do you have any idea?
The text was updated successfully, but these errors were encountered: