Skip to content

Potential race condition in axom::copy when Umpire is enabled. #1724

@BradWhitlock

Description

@BradWhitlock

In the new heavily_mixed MIR example, I was calling axom::copy() in an axom::for_all<axom::OMP_EXEC>() loop to copy some data into a slice of a 3D array. This was to speed up material construction on 3D meshes since I could have many 2D slices constructed in parallel.

I noticed that the test for heavily_mixed would crash intermittently on dane. Basically, the first time axom::copy is called, the Umpire resource manager is constructed as a singleton. Since axom::copy() was being called on multiple OpenMP threads at the same time, there was a race condition in creating the resource manager that would sometimes cause it to throw exceptions and terminate the program.

I worked around the problem in the heavily_mixed example by calling axom::copy() prior to the OpenMP loop. That makes sure the Umpire resource manager exists by the time the loop is called.

OpenMP loops like this are probably rare but is there something more general that should be done to avoid the problem?

  • Add a mutex into axom::copy around where the resource manager instance is retrieved?

Metadata

Metadata

Assignees

No one assigned

    Labels

    ReviewedbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions