Open
Description
Making this to track an issue first seen in #4 : some of the tests will call rmprocs()
, and after changing CI to run with JULIA_NUM_THREADS=4
the workers can hang until rmprocs()
times out and sends SIGQUIT.
Example backtrace:
Backtrace
From worker 21:
From worker 21: [2110] signal 3: Quit # Timeout, rmprocs() sends SIGQUIT
From worker 21: in expression starting at none:1
From worker 21: unknown function (ip: 0x7ff13a091115) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: pthread_cond_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
From worker 21: uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:822
From worker 21: jl_parallel_gc_threadfun at /cache/build/builder-amdci4-4/julialang/julia-master/src/gc-stock.c:3550
From worker 21: unknown function (ip: 0x7ff13a094ac2) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: unknown function (ip: 0x7ff13a12684f) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: unknown function (ip: (nil)) at (unknown file)
From worker 21: unknown function (ip: 0x7ff13a091115) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: pthread_cond_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
From worker 21: uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:822
From worker 21: jl_parallel_gc_threadfun at /cache/build/builder-amdci4-4/julialang/julia-master/src/gc-stock.c:3550
From worker 21: unknown function (ip: 0x7ff13a094ac2) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: unknown function (ip: 0x7ff13a12684f) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: unknown function (ip: (nil)) at (unknown file)
From worker 21: unknown function (ip: 0x7ff13a091115) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: pthread_cond_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
From worker 21: uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:822
From worker 21: jl_parallel_gc_threadfun at /cache/build/builder-amdci4-4/julialang/julia-master/src/gc-stock.c:3550
From worker 21: unknown function (ip: 0x7ff13a094ac2) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: unknown function (ip: 0x7ff13a12684f) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: unknown function (ip: (nil)) at (unknown file)
From worker 21: unknown function (ip: 0x7ff13a091115) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: pthread_cond_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
From worker 21: wait at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia_locks.h:130 [inlined]
From worker 21: operator() at /cache/build/builder-amdci4-4/julialang/julia-master/src/engine.cpp:97 [inlined]
From worker 21: jl_engine_reserve at /cache/build/builder-amdci4-4/julialang/julia-master/src/engine.cpp:100
From worker 21: engine_reserve at ./compiler/types.jl:408 [inlined]
From worker 21: engine_reserve at ./compiler/types.jl:407 [inlined]
From worker 21: typeinf_ext at ./compiler/typeinfer.jl:1080
From worker 21: typeinf_ext_toplevel at ./compiler/typeinfer.jl:1176 [inlined]
From worker 21: typeinf_ext_toplevel at ./compiler/typeinfer.jl:1174 # Start compilation and get stuck in the GC
From worker 21: jfptr_typeinf_ext_toplevel_48134.1 at /opt/hostedtoolcache/julia/nightly/x64/lib/julia/sys.so (unknown line)
From worker 21: jl_apply at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia.h:2243 [inlined]
From worker 21: jl_type_infer at /cache/build/builder-amdci4-4/julialang/julia-master/src/gf.c:394
From worker 21: jl_compile_method_internal at /cache/build/builder-amdci4-4/julialang/julia-master/src/gf.c:2820
From worker 21: _jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-master/src/gf.c:3299 [inlined]
From worker 21: ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-master/src/gf.c:3495
From worker 21: show_exception_stack at ./errorshow.jl:1015 # Something in an errormonitor fails and we try to print the exception
From worker 21: display_error at ./client.jl:117
From worker 21: #errormonitor##0 at ./task.jl:734
From worker 21: jfptr_YY.errormonitorYY.YY.0_74460.1 at /opt/hostedtoolcache/julia/nightly/x64/lib/julia/sys.so (unknown line)
From worker 21: jl_apply at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia.h:2243 [inlined]
From worker 21: start_task at /cache/build/builder-amdci4-4/julialang/julia-master/src/task.c:1263 # Switches to one of the remaining tasks
From worker 21: unknown function (ip: (nil)) at (unknown file)
From worker 21: unknown function (ip: 0x7ff13a091115) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: pthread_cond_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
From worker 21: uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:822
From worker 21: jl_safepoint_wait_thread_resume at /cache/build/builder-amdci4-4/julialang/julia-master/src/safepoint.c:271
From worker 21: segv_handler at /cache/build/builder-amdci4-4/julialang/julia-master/src/signals-unix.c:395 [inlined]
From worker 21: segv_handler at /cache/build/builder-amdci4-4/julialang/julia-master/src/signals-unix.c:381
From worker 21: unknown function (ip: 0x7ff13a04251f) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: jl_gc_state_set at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia_threads.h:275 [inlined]
From worker 21: maybe_collect at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia_threads.h:268 [inlined]
From worker 21: jl_gc_small_alloc_inner at /cache/build/builder-amdci4-4/julialang/julia-master/src/gc-stock.c:737 [inlined]
From worker 21: jl_gc_small_alloc_noinline at /cache/build/builder-amdci4-4/julialang/julia-master/src/gc-stock.c:795 [inlined]
From worker 21: jl_gc_alloc_ at /cache/build/builder-amdci4-4/julialang/julia-master/src/gc-stock.c:809
From worker 21: unknown function (ip: (nil)) at (unknown file)
From worker 21: unknown function (ip: 0x7ff13a091115) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: pthread_cond_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
From worker 21: uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:822
From worker 21: ijl_task_get_next at /cache/build/builder-amdci4-4/julialang/julia-master/src/scheduler.c:520
From worker 21: poptask at ./task.jl:1158
From worker 21: wait at ./task.jl:1167
From worker 21: task_done_hook at ./task.jl:839
From worker 21: jfptr_task_done_hook_74488.1 at /opt/hostedtoolcache/julia/nightly/x64/lib/julia/sys.so (unknown line)
From worker 21: jl_apply at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia.h:2243 [inlined]
From worker 21: jl_finish_task at /cache/build/builder-amdci4-4/julialang/julia-master/src/task.c:338
From worker 21: start_task at /cache/build/builder-amdci4-4/julialang/julia-master/src/task.c:1274
From worker 21: unknown function (ip: (nil)) at (unknown file)
From worker 21: pthread_cond_destroy at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
From worker 21: __cxa_finalize at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) # Running finalizers and atexit() handlers?
From worker 21: __do_global_dtors_aux at /opt/hostedtoolcache/julia/nightly/x64/bin/../lib/julia/libjulia-internal.so.1.12 (unknown line)
From worker 21: _fini at /opt/hostedtoolcache/julia/nightly/x64/bin/../lib/julia/libjulia-internal.so.1.12 (unknown line)
From worker 21: unknown function (ip: 0x7ff13a045494) at /lib/x86_64-linux-gnu/libc.so.6
From worker 21: exit at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
From worker 21: ijl_exit at /cache/build/builder-amdci4-4/julialang/julia-master/src/init.c:199
From worker 21: jlplt_ijl_exit_77448.1 at /opt/hostedtoolcache/julia/nightly/x64/lib/julia/sys.so (unknown line)
From worker 21: exit at ./initdefs.jl:28
From worker 21: exit at ./initdefs.jl:29 # exit() is called
From worker 21: jfptr_exit_77443.1 at /opt/hostedtoolcache/julia/nightly/x64/lib/julia/sys.so (unknown line)
From worker 21: jl_apply at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia.h:2243 [inlined]
From worker 21: jl_f__call_latest at /cache/build/builder-amdci4-4/julialang/julia-master/src/builtins.c:883
From worker 21: #invokelatest#1 at ./essentials.jl:1049 [inlined]
From worker 21: invokelatest at ./essentials.jl:1046
From worker 21: jfptr_invokelatest_62384.1 at /opt/hostedtoolcache/julia/nightly/x64/lib/julia/sys.so (unknown line)
From worker 21: jl_apply at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia.h:2243 [inlined]
From worker 21: do_apply at /cache/build/builder-amdci4-4/julialang/julia-master/src/builtins.c:839
From worker 21: #handle_msg##12 at /home/runner/work/DistributedNext.jl/DistributedNext.jl/src/process_messages.jl:312 # Worker gets call to `exit()` from the master
From worker 21: run_work_thunk at /home/runner/work/DistributedNext.jl/DistributedNext.jl/src/process_messages.jl:72
From worker 21: #handle_msg##10 at /home/runner/work/DistributedNext.jl/DistributedNext.jl/src/process_messages.jl:312
From worker 21: unknown function (ip: 0x7ff0fb7455bf) at (unknown file)
From worker 21: jl_apply at /cache/build/builder-amdci4-4/julialang/julia-master/src/julia.h:2243 [inlined]
From worker 21: start_task at /cache/build/builder-amdci4-4/julialang/julia-master/src/task.c:1263
From worker 21: unknown function (ip: (nil)) at (unknown file)
From worker 21: Allocations: 9179557 (Pool: 9179436; Big: 121); GC: 8
I've only observed this on nightly, almost always on Ubuntu/OSX, almost never on Windows. A couple of times the workers have segfaulted somewhere in LLVM, but I don't have a backtrace for that.
It doesn't happen every time rmprocs()
is called. The most reliable trigger is the topology.jl
tests, though once or twice I've seen other tests failing.