Skip to content

slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused #8

@johnh2o2

Description

@johnh2o2

Hi, I am following the instructions in the tutorial but running into this error when I try to run the simulation.

Is this a common issue? My docker command is

docker run -p 0.0.0.0:8888:8888 -p 0.0.0.0:8787:8787 -it --rm \
    --name slurmsim -h slurmsim \
    -v /Users/johnhoffman/Documents/slurm_sim/storage:/home/slurm/work nsimakov/slurm_sim:v3.0

And I see

WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

as the first line of output. I am running the micro_cluster tutorial.

This is the output of my submit command

Logger initialization
[INFO] Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[INFO] NumExpr defaulting to 8 threads.
[INFO] Read from /home/slurm/work/micro_cluster/workload/events.trace 20 even ts
[INFO] slurm.conf: /home/slurm/work/micro_cluster/etc/slurm.conf
[INFO] slurmdbd: /opt/slurm_sim/sbin/slurmdbd
[INFO] slurmd: /opt/slurm_sim/sbin/slurmd
[INFO] slurmctld: /opt/slurm_sim/sbin/slurmctld
[INFO] dropping db from previous runs
DROP DATABASE IF EXISTS slurmdb_micro
[INFO] deleting previous SlurmctldLogFile file: /home/slurm/work/micro_cluster/log/slurmctld.log
[INFO] deleting previous SlurmdLogFile file: /home/slurm/work/micro_cluster/log/slurmd.log
[INFO] deleting previous SlurmSchedLogFile file: /home/slurm/work/micro_cluster/log/sched.log
[INFO] deleting previous StateSaveLocation files from /home/slurm/work/micro_cluster/var/state
[INFO] deleting previous results dir: /home/slurm/work/micro_cluster/results/test1/dtstart_0_1
[DEBUG] Set stdout/stderr for slurmctld to /home/slurm/work/micro_cluster/log/slurmctld_stdout.log
[DEBUG] Set stdout/stderr for slurmdbd to /home/slurm/work/micro_cluster/log/slurmdbd_stdout.log
[INFO] Launching slurmdbd
[INFO] Running sacctmgr script from /home/slurm/work/micro_cluster/etc/sacctmgr.script
sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacctmgr: error: Sending PersistInit msg: Connection refused
sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: sacctmgr: 
[INFO] Launching slurmctld
['/opt/slurm_sim/sbin/slurmctld', '-e', '/home/slurm/work/micro_cluster/workload/events.trace', '-dtstart', '0']
[INFO] Current time 1731598818.63887
[INFO] slurmdbd_create_time=1731598805.61
[INFO] slurmctld_create_time=1731598812.72
[INFO] slurmd_create_time=1731598811.71
[INFO] Starting job submittion
[INFO] Monitoring slurmctld until completion
[INFO] All jobs submitted wrapping up
[INFO] slurmctld took 13.52983021736145 seconds to run.
first_line [2022-01-01T05:00:07.150992] error: Unable to open pidfile `/var/run/slurmctld.pid': Permission denied
 2022-01-01 05:00:07.150992 2022-01-01 05:00:07.150992
last_line [2022-01-01T05:00:07.167440] fatal: slurmdbd and/or database must be up at slurmctld start time
 2022-01-01 05:00:07.167440 2022-01-01 05:00:07.167440
sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused
[INFO] Copying results to :/home/slurm/work/micro_cluster/results/test1/dtstart_0_1
[INFO] copying resulting file /home/slurm/work/micro_cluster/log/jobcomp.log to /home/slurm/work/micro_cluster/results/test1/dtstart_0_1
Traceback (most recent call last):
  File "/opt/slurm_sim_tools/bin/slurmsim", line 19, in <module>
    slurmsim.cli.CLI().run()
  File "/opt/slurm_sim_tools/src/slurmsim/cli.py", line 189, in run
    return cli_args.func(cli_args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/slurm_sim_tools/src/slurmsim/cli.py", line 71, in handler
    run_slurm(args)
  File "/opt/slurm_sim_tools/src/slurmsimtools/run_slurmsim.py", line 679, in run_slurm
    shutil.copy(slurm_conf[paraml], results_dir)
  File "/opt/conda/lib/python3.11/shutil.py", line 419, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/opt/conda/lib/python3.11/shutil.py", line 256, in copyfile
    with open(src, 'rb') as fsrc:
         ^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/slurm/work/micro_cluster/log/jobcomp.log'

Let me know if any additional information would be helpful

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions