-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Abstract
Add an isAtomic
flag to ARC counters implemented via a bitmask. Then increment
/ decrement
operations for ARC could check the flag to determine whether an atomic operation is necessary. The stdlib would need to provide a method for user code to set this flag before sharing with another thread. Ideally this could be built into threading primitives as well.
Alternatively, this idea could be morphed to be a compile time flag. Users would use similar mechanisms to annotate what objects they're sharing. It would instead flag for the compiler that any object types involved should always use atomic counters. However, this would be trickier to use across shared library boundaries.
Motivation
Currently we have --mm:atomicArc
which is enables sharing ref objects between threads. However, it can have a large performance impact on older machines or certain hotspots. Using an optional flag could balance both wanting atomic for thread safety with performance for non-threaded code.
I've been tossing about this possible design for a week or two, but I'm not convinced it'd be better than a pure atomicArc
. However, I think it'd be worthwhile to check with others.
Additionally, another flag could possibly be added for ORC to "lock" an isAtomic
object caught up during a cycle collection. I don't know the details of the ORC cycle detector aside from knowing that marking objects is common in such algorithms.
Description
There are downsides to using an isAtomic
flag. The biggest is the potential overhead from branching in the incr
/ decr
operations. It would require benchmarking to determine whether the potential for extra branching outweighs the cost of atomics.
Modern processors have large branch predictors, but this could permeate the code incurring an excessive increase in the usage of branch-predictor capacity. On older machines this might create a bigger performance impact than atomics. However a back-of-the-napkin comparison of the order-of-magnitude of doing a new L1 or L2 cache request to synchronize an atomic (often 100+ CPU cycles) would take much longer than recovering from a mis-predicated branch (usually less than 10's of CPU cycles).
Though perhaps usage of if (unlikely(isAtomic(x))
could mitigate the overhead at some cost for threads.
Another possible benefit is that code which using Isolate
to move data between threads would continue not needing atomic operations.
Code Examples
Some rough pseudo-code to demonstrate:
var sharedObj: SomeRefObject
var chan: Chan[SomeRefObject]
proc doProcess1() {.thread1.} =
let obj = SomeRefObject()
setAtomicShared(sharedObj, obj) # could bypass needing `{.cast: gcsafe.}`
# alternatively
chan.send(markAtomic(obj))
proc doProcess2() {.thread2.} =
while sharedObj == nil:
os.sleep(10)
let obj = sharedObj
echo "got shared object: ", obj
...
In the case that the isAtomic
flag was a compile time property, the same code would just mark for the compiler to use atomic incr
/ decr
mechanisms. This would be more flexible than marking individual objects as atomic like:
type SomeObject* {.atomic.} = ref object
Marking individual objects as atomic at declaration would prohibit users from marking objects from libraries as thread-safe objects. However, one could possible think of something like an alias type but that feels error prone to implement at the compiler level:
type SomeObject* {.atomic.} = LibraryObject
Backwards Compatibility
A compiler flag could be added to remove the isAtomic
check and make counters either fully atomic or not-atomic at all. This could cover cases like embedded devices that might not need atomics at all, or to always enable atomics.