You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But when I execute "python3 main.py" the following error occurs.
main.py:57: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_path="conf", config_name="config")
/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
=================== save_dir /root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/data/bpe file_name vocab.json
=================== save_dir /root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/data/bpe file_name vocab.json
File /root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/data/bpe/vocab.json already exists, skipping download.
=================== save_dir /root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/data/bpe file_name merges.txt
=================== save_dir /root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/data/bpe file_name merges.txt
File /root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/data/bpe/merges.txt already exists, skipping download.
Job nemo-megatron-download_gpt3_pile submission file created at '/root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/results/download_gpt3_pile/download/nemo-megatron-download_gpt3_pile_submission.sh'
sbatch: error: Batch job submission failed: Invalid generic resource (gres) specification
Error executing job with overrides: []
subprocess.CalledProcessError: Command '['sbatch', '/root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/results/download_gpt3_pile/download/nemo-megatron-download_gpt3_pile_submission.sh']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 87, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
_run_app(
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
run_and_report(
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 216, in run_and_report
raise ex
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
return func()
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
lambda: hydra.run(
File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "main.py", line 75, in main
job_id = stage.run()
File "/root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/nemo_launcher/core/data_stages.py", line 70, in run
job_id = launcher.launch(command_groups=command_groups)
File "/root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/nemo_launcher/core/launchers.py", line 58, in launch
job_id = self._launcher.launch(command_groups)
File "/root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/nemo_launcher/core/launchers.py", line 96, in launch
job_id = self._submit_command(submission_file_path)
File "/root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/nemo_launcher/core/launchers.py", line 393, in _submit_command
output = job_utils.CommandFunction(command_list, verbose=False)() # explicit errors
File "/root/container_tools/NeMo-Megatron-Launcher/launcher_scripts/nemo_launcher/utils/job_utils.py", line 124, in __call__
raise OSError(stderr) from subprocess_error
OSError: sbatch: error: Batch job submission failed: Invalid generic resource (gres) specification
Thanks
Aaron
The text was updated successfully, but these errors were encountered:
Hi NVIDIA,
Slurm should be ready:
root@user:~/container_tools/NeMo-Megatron-Launcher/launcher_scripts# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
dgx* up infinite 1 idle user
First, download the pile dataset offline
Then, only place 00.jsonl.zst to the launcher_scripts/data/bpe path.
But when I execute "python3 main.py" the following error occurs.
Thanks
Aaron
The text was updated successfully, but these errors were encountered: