Description
The discussion about enforcing (ensuring?) determinism in WASI has already been started and touched upon in a couple of issues here and there (#185, #118, bytecodealliance/wasmtime#748, if I missed any, please feel free to mention it in this thread). I'd like to gather all the knowledge, ideas, perceived issues, etc. here creating essentially a meta-issue that we could use to track this, and come up with solutions, or at least guidance as to what direction to take.
I'll try and describe all potential sources of nondeterminism below leaving out sockets for now though. Feel free to correct me, add more, etc.
Randomness and entropy
This is an obvious one, and from what I understand, the current consensus is to have it require a capability (see #185 and bytecodealliance/wasmtime#748 for more details). random_get
also will get its own module in the upcoming WASI snapshot: wasi_ephemeral_random.witx.
Clocks
Access to system/thread/process clocks will also lead to nondeterminism, and as far as I understand, like in the randomness case, the consensus is to have it require a capability (see #118 and bytecodealliance/wasmtime#748 for more details). Also as in the randomness case, clock_time_get
will get its own module in the upcoming WASI snapshost: wasi_ephemeral_clock.witx.
File access/modification/change times
This one concerns four WASI syscalls that may introduce nondeterminism into the picture, namely: fd_filestat_get
, fd_filestat_set_times
, path_filestat_get
, and path_filestat_set_times
. The nondeterminism may sneak in if the client app makes use in some way of the access (atim
), modification (mtim
) or change (ctim
) times of a file descriptor which can change between any two runs of the app.
I'm not sure what the best approach to handle this would be, so I'd like to start some brain storming on this. Could we only perhaps populate the filestat time values if a clock capability was requested/provided? Or introduce a different type of capability?
Readdir
This one is potentially of lesser importance/impact since, I assume that on the same host, fd_readdir
syscall should return the same ordering of the entries---according to the macOS man of readdir
Note that the order of the directory entries vended by readdir() is not specified. Some filesystems may return entries in lexicographic sort order and others may not.
I assume similar will hold on all *nixes, so as long as the same host with the same filesystem is used, the order should be the same between the app runs. The problem, however, may become more pronounced in distributed settings where we'd like to execute an app on two unknown and potentially different hosts, and expect deterministic, comparable results on both.
There already was some discussion about ordering of results, seeking, and fd_readdir
in general in #61.
Poll
I'm adding that one in since I remember having a discussion with @marmistrz about this one, and he was convinced he could generate entropy with a clever use of poll_oneoff
, hence bringing in nondeterminism into the picture. @marmistrz perhaps you could shed more light on this?