You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,dear. I download dataset wmt14 en-de, and when I run the script "sh sh_train.sh", I meeting the ERROR:Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/luxiuzhi/luxiuzhi/Data-Rejuvenation/fairseq/fairseq/distributed_utils.py", line 177, in all_gather_list
result.append(pickle.loads(bytes(out_buffer[header_size:header_size + enc_size].tolist())))
_pickle.UnpicklingError: invalid load key, '\xe2'.
Exception: Unable to unpickle data from other workers. all_gather_list requires all workers to enter the function together, so this error usually indicates that the workers have fallen out of sync somehow. Workers can fall out of sync if one of them runs out of memory, or if there are other conditions in your training script that can cause one worker to finish an epoch while other workers are still iterating over their portions of the data.
can you help me check it pls
The text was updated successfully, but these errors were encountered:
Hi,dear. I download dataset wmt14 en-de, and when I run the script "sh sh_train.sh", I meeting the ERROR:Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/luxiuzhi/luxiuzhi/Data-Rejuvenation/fairseq/fairseq/distributed_utils.py", line 177, in all_gather_list
result.append(pickle.loads(bytes(out_buffer[header_size:header_size + enc_size].tolist())))
_pickle.UnpicklingError: invalid load key, '\xe2'.
Exception: Unable to unpickle data from other workers. all_gather_list requires all workers to enter the function together, so this error usually indicates that the workers have fallen out of sync somehow. Workers can fall out of sync if one of them runs out of memory, or if there are other conditions in your training script that can cause one worker to finish an epoch while other workers are still iterating over their portions of the data.
can you help me check it pls
I haven't met such a problem before. But you may check if it helps to increase "--all-gather-list-size" to a larger value.
Also presenting your training script and the log here may make it easier to spot the problem.
Hi,dear. I download dataset wmt14 en-de, and when I run the script "sh sh_train.sh", I meeting the ERROR:Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/luxiuzhi/luxiuzhi/Data-Rejuvenation/fairseq/fairseq/distributed_utils.py", line 177, in all_gather_list
result.append(pickle.loads(bytes(out_buffer[header_size:header_size + enc_size].tolist())))
_pickle.UnpicklingError: invalid load key, '\xe2'.
Exception: Unable to unpickle data from other workers. all_gather_list requires all workers to enter the function together, so this error usually indicates that the workers have fallen out of sync somehow. Workers can fall out of sync if one of them runs out of memory, or if there are other conditions in your training script that can cause one worker to finish an epoch while other workers are still iterating over their portions of the data.
can you help me check it pls
The text was updated successfully, but these errors were encountered: