Skip to content

Caching potentially causes intermittent segfault #2774

@joelberkeley

Description

@joelberkeley

I was seeing intermittent segfaults in this line

dyn_cast<AutoDiffFunctionInterface>(newFunc.getOperation())) {

The segfaults stopped if I deleted this cache fetch

{
auto cachedFn = ReverseCachedFunctions.find(tup);
if (cachedFn != ReverseCachedFunctions.end())
return cachedFn->second;
}

This was observed when differentiating several MLIR graphs, such as below. Each graph would be run anywhere between 2 to 400 times, each run with different constants

module @root {
  func.func @main() -> tensor<f64> {
    %cst = stablehlo.constant dense<1.000000e+00> : tensor<f64>
    %cst_0 = stablehlo.constant dense<0.000000e+00> : tensor<f64>
    %0 = enzyme.autodiff_region(%cst_0, %cst) {
    ^bb0(%arg0: tensor<f64>):
      enzyme.yield %arg0 : tensor<f64>
    } attributes {activity = [#enzyme<activity enzyme_active>], ret_activity = [#enzyme<activity enzyme_activenoneed>]} : (tensor<f64>, tensor<f64>) -> tensor<f64>
    return %0 : tensor<f64>
  }
}

These are my passes

{-#
  external_resources: {
    mlir_reproducer: {
      pipeline: "builtin.module(outline-enzyme-regions, enzyme{postpasses=canonicalize,remove-unnecessary-enzyme-ops verifyPostPasses=true}, arith-raise{stablehlo=true})",
      disable_threading: false,
      verify_each: true
    }
  }
#-}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions