Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Downloads.download during precompilation of a package waits a long time for "IO to finish" #223

Closed
KristofferC opened this issue Apr 18, 2023 · 11 comments · Fixed by #234
Labels

Comments

@KristofferC
Copy link
Member

KristofferC commented Apr 18, 2023

The following MWE package takes around 30s when precompiling due to "waiting for IO to finish":

module DownloadsPrecompile

using Downloads

server = "https://pkg.julialang.org"
tmp_path = tempname()
Downloads.download("$server/registries", tmp_path, verbose=false)

end # module DownloadsPrecompile

We can see the download call finish quickly but the precompilation of the package takes ~30 seconds.

julia> @time using DownloadsPrecompile
[ Info: Precompiling DownloadsPrecompile [919f6c70-75de-4838-9889-9bb6f7664974]
  0.881326 seconds (1.41 M allocations: 95.719 MiB, 3.62% gc time, 53.29% compilation time)

[pid 77789] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 timer              @0x12b022360->0x10d6cd510

[pid 77789] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 timer              @0x12b022360->0x10d6cd510
 31.579715 seconds (628.12 k allocations: 43.788 MiB, 0.03% gc time, 0.48% compilation time)

There must be some timer hanging around for a long time that doesn't get stopped when the download finish?

@maleadt
Copy link
Member

maleadt commented Apr 18, 2023

SIGUSR1 doesn't reveal anything interesting:

======================================================================================
Information request received. A stacktrace will print followed by a 1.0 second profile
======================================================================================

cmd: /home/tim/Julia/src/julia/build/dev/usr/bin/julia 221194 running 0 of 1

signal (10): User defined signal 1
epoll_wait at /usr/lib/libc.so.6 (unknown line)
uv__io_poll at /workspace/srcdir/libuv/src/unix/epoll.c:236
uv_run at /workspace/srcdir/libuv/src/unix/core.c:400
ijl_task_get_next at /home/tim/Julia/src/julia/src/partr.c:411
poptask at ./task.jl:974
wait at ./task.jl:983
jfptr_wait_51533 at /home/tim/Julia/src/julia/build/dev/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /home/tim/Julia/src/julia/src/gf.c:2805 [inlined]
ijl_apply_generic at /home/tim/Julia/src/julia/src/gf.c:3006
jl_task_wait_empty at /home/tim/Julia/src/julia/src/partr.c:298
ijl_write_compiler_output at /home/tim/Julia/src/julia/src/precompile.c:78
ijl_atexit_hook at /home/tim/Julia/src/julia/src/init.c:251
jl_repl_entrypoint at /home/tim/Julia/src/julia/src/jlapi.c:718
main at /home/tim/Julia/src/julia/cli/loader_exe.c:58
unknown function (ip: 0x7f187d3a578f)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at /home/tim/Julia/src/julia/build/dev/usr/bin/julia (unknown line)
unknown function (ip: (nil))

==============================================================
Profile collected. A report will print at the next yield point
==============================================================

Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
Thread 1 Task 0x00007f1870bfc010 Total snapshots: 933. Utilization: 0%
   ╎933 task.jl:983; wait()
932╎ 933 task.jl:974; poptask(W::Base.IntrusiveLinkedListSynchronized{Task})

static_show of the timer:

Base.Timer(handle=0x00005600e53edf50, cond=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=Task(next=nothing, queue=<circular reference @-2>, storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=Base.var"#709#710"{Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}, Base.Timer}(cb=Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}(multi=Downloads.Curl.Multi(lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0)), handle=0x00005600e55cf640, timer=<circular reference @-5>, easies=Array{Downloads.Curl.Easy, (0,)}[], grace=0x0000000000007530)), timer=<circular reference @-4>), rngState0=0xdbcd839469c013a1, rngState1=0xfab5bb3a99c2487e, rngState2=0xefcf0c5cc0727879, rngState3=0x759b3bfafecf68ae, rngState4=0xcdb0480cfc10e1cb, _state=0x00, sticky=true, _isexception=false, priority=0x0000), tail=Task(next=nothing, queue=<circular reference @-2>, storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=Base.var"#709#710"{Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}, Base.Timer}(cb=Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}(multi=Downloads.Curl.Multi(lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0)), handle=0x00005600e55cf640, timer=<circular reference @-5>, easies=Array{Downloads.Curl.Easy, (0,)}[], grace=0x0000000000007530)), timer=<circular reference @-4>), rngState0=0xdbcd839469c013a1, rngState1=0xfab5bb3a99c2487e, rngState2=0xefcf0c5cc0727879, rngState3=0x759b3bfafecf68ae, rngState4=0xcdb0480cfc10e1cb, _state=0x00, sticky=true, _isexception=false, priority=0x0000)), lock=Base.Threads.SpinLock(owned=0)), isopen=true, set=false)

Unsurprisingly, this looks like a timer from Downloads.jl

@KristofferC
Copy link
Member Author

KristofferC commented Apr 18, 2023

I think this is from #38. There is a timer that will reuse the download object if it is used within 30 seconds. Initial thought is that 30 seconds feels very long.

Edit: Looks like I thought so 3 years ago as well: #38 (comment) :P

I guess I can set a default downloader with a different grace period.

@StefanKarpinski
Copy link
Member

Should we reduce the default grace period? That would be fine.

@KristofferC
Copy link
Member Author

KristofferC commented Apr 18, 2023

I guess one question to ask is: how much do you approximately win (in time) by reusing an existing download object? The grace period should be somewhat similar to that number I think?

@StefanKarpinski
Copy link
Member

Ah, yeah, that's a good point. I have no idea and I think it depends a lot on whether you are able to reuse pre-negotiated TLS connections and so on.

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2023

There should be a call Base.uv_unref(timer.handle) to objects that you want to be implicitly destroyed without cleanup.

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2023

I had run into this issue when doing Pkg.add because of CUDA_Driver_jll:

  Downloaded artifact: CUDA_Driver

[pid 43002] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 timer              @0x55c935bbf1d0->0x7fee3b5e8610

(gdb) p jl_(0x7fee3b5e8610)
Base.Timer(handle=0x000055c935bbf1d0, cond=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=Task(next=nothing, queue=<circular reference @-2>, storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=Base.var"#749#750"{Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}, Base.Timer}(cb=Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}(multi=Downloads.Curl.Multi(lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0)), handle=0x000055c935a7d140, timer=<circular reference @-5>, easies=Array{Downloads.Curl.Easy, 1}(offset=0, dims=(0,), mem=Memory{false, Downloads.Curl.Easy}(8, 0x7fee3b77fd40)[#<null>, #<null>, #<null>, #<null>, #<null>, #<null>, #<null>, #<null>]), grace=0x0000000000007530)), timer=<circular reference @-4>), rngState0=0xc1704e2ee4572007, rngState1=0x057b4284eedb0db6, rngState2=0x2c271697fc8632e6, rngState3=0x2da6f23c77880517, rngState4=0xdab8deec1c6f902c, _state=0x00, sticky=true, _isexception=false, priority=0x0000), tail=Task(next=nothing, queue=<circular reference @-2>, storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=Base.var"#749#750"{Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}, Base.Timer}(cb=Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}(multi=Downloads.Curl.Multi(lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0)), handle=0x000055c935a7d140, timer=<circular reference @-5>, easies=Array{Downloads.Curl.Easy, 1}(offset=0, dims=(0,), mem=Memory{false, Downloads.Curl.Easy}(8, 0x7fee3b77fd40)[#<null>, #<null>, #<null>, #<null>, #<null>, #<null>, #<null>, #<null>]), grace=0x0000000000007530)), timer=<circular reference @-4>), rngState0=0xc1704e2ee4572007, rngState1=0x057b4284eedb0db6, rngState2=0x2c271697fc8632e6, rngState3=0x2da6f23c77880517, rngState4=0xdab8deec1c6f902c, _state=0x00, sticky=true, _isexception=false, priority=0x0000)), lock=Base.Threads.SpinLock(owned=0)), isopen=true, set=false)

(gdb) p *(uv_timer_t*)0x55c935bbf1d0
$2 = {data = 0x7fee3b5e8610, loop = 0x7fee443dd3a0 <default_loop_struct>, type = UV_TIMER, close_cb = 0x7fee43e4f390 <jl_uv_closeHandle>, handle_queue = {0x7fee4412d680 <wait_empty_worker+32>, 0x55c9358d7860}, next_closing = 0x0, flags = 12, timer_cb = 0x7fee2d03afe0 <jlcapi_uv_timercb_78091>, heap_node = {0x7fee4412d6a8 <wait_empty_worker+72>, 0x0, 0x0}, 
  timeout = 33384968562, repeat = 0, start_id = 57}
(gdb) p $2.timeout - uv_now($2.loop)
$5 = 13582 // about 13 seconds left, probably about 13+15=28 seconds total just waiting

(gdb) info proc 
process 43002
cmdline = '/home/vtjnash/julia/usr/bin/julia -Cnative -J/home/vtjnash/julia/usr/lib/julia/sys.so -O0 -g1 --color=no --startup-file=no -O0 --output-o /home/vtjnash/.julia/compiled/v1.11/CUDA_Driver_jll/jl_WvsBaB --output-ji /home/vtjnash/.julia/compiled/v1.11/CUDA_Driver_jll/jl_Ycvc7A --output-incremental=yes --startup-file=no --history-file=no --warn-overwrite=yes --color=no -'
cwd = '/home/vtjnash/.julia/packages/CUDA_Runtime_jll/kaNVq/.pkg'
exe = '/home/vtjnash/julia/usr/bin/julia'

@vtjnash vtjnash added the latency label Sep 8, 2023
@vtjnash
Copy link
Member

vtjnash commented Nov 15, 2023

Update: the expected way to deal with this now is currently to register an atexit hook that eagerly closes any outstanding downloads and stops the creation of any future background downloads (e.g. sets the timeout to zero).

@IanButterworth
Copy link
Member

There should be a call Base.uv_unref(timer.handle) to objects that you want to be implicitly destroyed without cleanup.

sounds good.

Update: the expected way to deal with this now is currently to register an atexit hook that eagerly closes any outstanding downloads and stops the creation of any future background downloads (e.g. sets the timeout to zero).

Is this guidance for the user or how to fix Downloads.jl itself?

@vtjnash
Copy link
Member

vtjnash commented Jan 11, 2024

Downloads.jl should fix this (probably with an atexit hook to call closeall)

@IanButterworth
Copy link
Member

#234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants