Skip to content

[OpenMP][Aarch64] Invalid frame pointers break profiling with nsight #163352

@dominik-wojt-2311446

Description

@dominik-wojt-2311446

Hi All,

when LIBOMP_OMPT_SUPPORT=TRUE, I cannot collect correct backtrace in sampling based profiling on NVIDIA Nsight compute. This profiler uses the same method as perf does, so perf is most likely also affected. The issue affects Aarch64, but not x86_64. 32-bit ARM might be affected too, but I have no platform to test on. Recompiling libomp with LIB_OMPT_SUPPORT=FALSE helps avoiding the problem, but obviously breaks collection of OMPT events.

Backtrace with LIBOMP_OMPT_SUPPORT=TRUE:

Call stack at 0.0311147s:
omp_example!std::chrono::duration<...>::count() const
omp_example!work()
omp_example!main
omp_example!main
libomp.so!__kmp_invoke_microtask
[Unknown]!0x1
[Broken backtraces]![Broken backtraces]

Backtrace with LIBOMP_OMPT_SUPPORT=FALSE:

Call stack at 0.0274804s:
omp_example!std::chrono::duration<...>::duration<...>(...)
omp_example!work()
omp_example!main
omp_example!main
libomp.so!__kmp_invoke_microtask
libomp.so!__kmp_invoke_task_func
libomp.so!__kmp_fork_call
libomp.so!__kmpc_fork_call
omp_example!main
libc.so.6!__libc_start_call_main
libc.so.6!__libc_start_main@@GLIBC_2
omp_example!_start
[Broken backtraces]![Broken backtraces]

I took a look at the __kmp_invoke_microtask code for aarch64. I am not knowledgeable about frame pointers nor aarch64 assembly at all, but I think the issue is that when OMPT is enabled, some data is pushed to the stack between the save of the frame pointer register (x29) to the stack and saving the stack pointer to the frame pointer register. This way, the stack pointer register does not point to the memory where previous frame pointer was saved. As experiment I changed the order of operations:

diff --git a/openmp/runtime/src/z_Linux_asm.S b/openmp/runtime/src/z_Linux_asm.S
index 0bf9f07a13f1..d641efcef2a8 100644
--- a/openmp/runtime/src/z_Linux_asm.S
+++ b/openmp/runtime/src/z_Linux_asm.S
@@ -1358,10 +1358,10 @@ __tid = 8
        PACBTI_C
 
        stp     x29, x30, [sp, #-16]!
+       mov     x29, sp
 # if OMPT_SUPPORT
        stp     x19, x20, [sp, #-16]!
 # endif
-       mov     x29, sp
 
        orr     w9, wzr, #1
        add     w9, w9, w3, lsr #1
@@ -1413,11 +1413,11 @@ KMP_LABEL(kmp_0):
 KMP_LABEL(kmp_1):
        blr     x8
        orr     w0, wzr, #1
-       mov     sp, x29
 # if OMPT_SUPPORT
        str     xzr, [x19]
        ldp     x19, x20, [sp], #16
 # endif
+       mov     sp, x29
        ldp     x29, x30, [sp], #16
        PACBTI_RET
        ret

I recompiled with LIB_OMPT_SUPPORT=TRUE and got both backtrace and OMPT working for my toy example.

Backtrace with the experimental change:

Call stack at 0.0358592s:
omp_example!std::chrono::duration<...>::count() const
omp_example!bool std::chrono::operator< <long, std::ratio<1l, 1000000000l>, long, std::ratio<1l, 1l> >(...)
omp_example!work()
omp_example!main
omp_example!main
libomp.so!__kmp_invoke_microtask
libomp.so!__kmp_invoke_task_func
libomp.so!__kmp_fork_call
libomp.so!__kmpc_fork_call
omp_example!main
libc.so.6!__libc_start_call_main
libc.so.6!__libc_start_main@@GLIBC_2
omp_example!_start
[Broken backtraces]![Broken backtraces]

Toy example:

#include <chrono>
#include <iostream>

void work() {
  const auto start = std::chrono::steady_clock::now();
  auto now = std::chrono::steady_clock::now();
  int i = 0;
  while ((now - start) < std::chrono::seconds{1}) {
    i = (i + 7) * 74 % 211;
    now = std::chrono::steady_clock::now();
  }
  std::cout << "i = " << i << '\n';
}

int main() {
#pragma omp parallel for num_threads(8)
  for (int i = 0; i < 8; ++i) {
    work();
  }
  return 0;
}

Compiled and profiled with:

/home/dowj/local/llvm/ompt_fixed/bin/clang++ -g -fopenmp=libomp -o CMakeFiles/omp_example.dir/omp_example.cpp.o -c /home/dowj/sources/omp_example/omp_example.cpp
LD_LIBRARY_PATH=/home/dowj/local/llvm/ompt_fixed/lib/aarch64-unknown-linux-gnu:$LD_LIBRARY_PATH nsys profile --output=gh200_debug_ompt_fixed --backtrace=fp --trace=openmp --force-overwrite=true ./omp_example

Could someone with better understanding of frame pointers and OMPT take a look?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions