Hi, I am following the instructions in the tutorial but running into this error when I try to run the simulation.
Is this a common issue? My docker command is
docker run -p 0.0.0.0:8888:8888 -p 0.0.0.0:8787:8787 -it --rm \
--name slurmsim -h slurmsim \
-v /Users/johnhoffman/Documents/slurm_sim/storage:/home/slurm/work nsimakov/slurm_sim:v3.0
And I see
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
as the first line of output. I am running the micro_cluster tutorial.
This is the output of my submit command
Logger initialization
[INFO] Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[INFO] NumExpr defaulting to 8 threads.
[INFO] Read from /home/slurm/work/micro_cluster/workload/events.trace 20 even ts
[INFO] slurm.conf: /home/slurm/work/micro_cluster/etc/slurm.conf
[INFO] slurmdbd: /opt/slurm_sim/sbin/slurmdbd
[INFO] slurmd: /opt/slurm_sim/sbin/slurmd
[INFO] slurmctld: /opt/slurm_sim/sbin/slurmctld
[INFO] dropping db from previous runs
DROP DATABASE IF EXISTS slurmdb_micro
[INFO] deleting previous SlurmctldLogFile file: /home/slurm/work/micro_cluster/log/slurmctld.log
[INFO] deleting previous SlurmdLogFile file: /home/slurm/work/micro_cluster/log/slurmd.log
[INFO] deleting previous SlurmSchedLogFile file: /home/slurm/work/micro_cluster/log/sched.log
[INFO] deleting previous StateSaveLocation files from /home/slurm/work/micro_cluster/var/state
[INFO] deleting previous results dir: /home/slurm/work/micro_cluster/results/test1/dtstart_0_1
[DEBUG] Set stdout/stderr for slurmctld to /home/slurm/work/micro_cluster/log/slurmctld_stdout.log
[DEBUG] Set stdout/stderr for slurmdbd to /home/slurm/work/micro_cluster/log/slurmdbd_stdout.log
[INFO] Launching slurmdbd
[INFO] Running sacctmgr script from /home/slurm/work/micro_cluster/etc/sacctmgr.script
sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacctmgr: error: Sending PersistInit msg: Connection refused
sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr:
[INFO] Launching slurmctld
['/opt/slurm_sim/sbin/slurmctld', '-e', '/home/slurm/work/micro_cluster/workload/events.trace', '-dtstart', '0']
[INFO] Current time 1731598818.63887
[INFO] slurmdbd_create_time=1731598805.61
[INFO] slurmctld_create_time=1731598812.72
[INFO] slurmd_create_time=1731598811.71
[INFO] Starting job submittion
[INFO] Monitoring slurmctld until completion
[INFO] All jobs submitted wrapping up
[INFO] slurmctld took 13.52983021736145 seconds to run.
first_line [2022-01-01T05:00:07.150992] error: Unable to open pidfile `/var/run/slurmctld.pid': Permission denied
2022-01-01 05:00:07.150992 2022-01-01 05:00:07.150992
last_line [2022-01-01T05:00:07.167440] fatal: slurmdbd and/or database must be up at slurmctld start time
2022-01-01 05:00:07.167440 2022-01-01 05:00:07.167440
sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
[INFO] Copying results to :/home/slurm/work/micro_cluster/results/test1/dtstart_0_1
[INFO] copying resulting file /home/slurm/work/micro_cluster/log/jobcomp.log to /home/slurm/work/micro_cluster/results/test1/dtstart_0_1
Traceback (most recent call last):
File "/opt/slurm_sim_tools/bin/slurmsim", line 19, in <module>
slurmsim.cli.CLI().run()
File "/opt/slurm_sim_tools/src/slurmsim/cli.py", line 189, in run
return cli_args.func(cli_args)
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/slurm_sim_tools/src/slurmsim/cli.py", line 71, in handler
run_slurm(args)
File "/opt/slurm_sim_tools/src/slurmsimtools/run_slurmsim.py", line 679, in run_slurm
shutil.copy(slurm_conf[paraml], results_dir)
File "/opt/conda/lib/python3.11/shutil.py", line 419, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/opt/conda/lib/python3.11/shutil.py", line 256, in copyfile
with open(src, 'rb') as fsrc:
^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/slurm/work/micro_cluster/log/jobcomp.log'
Let me know if any additional information would be helpful
Hi, I am following the instructions in the tutorial but running into this error when I try to run the simulation.
Is this a common issue? My docker command is
And I see
as the first line of output. I am running the
micro_clustertutorial.This is the output of my submit command
Let me know if any additional information would be helpful