Skip to content

Add isAtomic flag to ARC counters Β #535

@elcritch

Description

@elcritch

Abstract

Add an isAtomic flag to ARC counters implemented via a bitmask. Then increment / decrement operations for ARC could check the flag to determine whether an atomic operation is necessary. The stdlib would need to provide a method for user code to set this flag before sharing with another thread. Ideally this could be built into threading primitives as well.

Alternatively, this idea could be morphed to be a compile time flag. Users would use similar mechanisms to annotate what objects they're sharing. It would instead flag for the compiler that any object types involved should always use atomic counters. However, this would be trickier to use across shared library boundaries.

Motivation

Currently we have --mm:atomicArc which is enables sharing ref objects between threads. However, it can have a large performance impact on older machines or certain hotspots. Using an optional flag could balance both wanting atomic for thread safety with performance for non-threaded code.

I've been tossing about this possible design for a week or two, but I'm not convinced it'd be better than a pure atomicArc. However, I think it'd be worthwhile to check with others.

Additionally, another flag could possibly be added for ORC to "lock" an isAtomic object caught up during a cycle collection. I don't know the details of the ORC cycle detector aside from knowing that marking objects is common in such algorithms.

Description

There are downsides to using an isAtomic flag. The biggest is the potential overhead from branching in the incr / decr operations. It would require benchmarking to determine whether the potential for extra branching outweighs the cost of atomics.

Modern processors have large branch predictors, but this could permeate the code incurring an excessive increase in the usage of branch-predictor capacity. On older machines this might create a bigger performance impact than atomics. However a back-of-the-napkin comparison of the order-of-magnitude of doing a new L1 or L2 cache request to synchronize an atomic (often 100+ CPU cycles) would take much longer than recovering from a mis-predicated branch (usually less than 10's of CPU cycles).

Though perhaps usage of if (unlikely(isAtomic(x)) could mitigate the overhead at some cost for threads.

Another possible benefit is that code which using Isolate to move data between threads would continue not needing atomic operations.

Code Examples

Some rough pseudo-code to demonstrate:

var sharedObj: SomeRefObject
var chan: Chan[SomeRefObject]

proc doProcess1() {.thread1.} =
  let obj = SomeRefObject()
  setAtomicShared(sharedObj, obj) # could bypass needing `{.cast: gcsafe.}`
  # alternatively
  chan.send(markAtomic(obj))  

proc doProcess2() {.thread2.} =
  while sharedObj == nil:
    os.sleep(10)
  
  let obj = sharedObj
  echo "got shared object: ", obj
  ...

In the case that the isAtomic flag was a compile time property, the same code would just mark for the compiler to use atomic incr / decr mechanisms. This would be more flexible than marking individual objects as atomic like:

type SomeObject* {.atomic.} = ref object

Marking individual objects as atomic at declaration would prohibit users from marking objects from libraries as thread-safe objects. However, one could possible think of something like an alias type but that feels error prone to implement at the compiler level:

type SomeObject* {.atomic.} = LibraryObject

Backwards Compatibility

A compiler flag could be added to remove the isAtomic check and make counters either fully atomic or not-atomic at all. This could cover cases like embedded devices that might not need atomics at all, or to always enable atomics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions