Skip to content

dyno gputrace only profiles CPU launcher processes, fails to capture GPU training processes #430

@shimizust

Description

@shimizust

When using dyno gputrace to profile PyTorch training jobs, the tool only successfully profiles CPU-based launcher processes but fails to capture the actual GPU training processes that are consuming GPU resources.

Current Behavior

  • Successfully profiled: PIDs 6, 90, 355 (PyFlyte executors and Accelerate launcher)
  • Failed to profile: PIDs 389, 390 (actual GPU training processes running training.py)

Expected Behavior

dyno gputrace should successfully profile all matched processes, including GPU-bound PyTorch training processes.

Steps to Reproduce

# Start distributed training job on H100 GPUs
accelerate launch --config_file accelerate_fsdp.conf ... training.py

# Start dynolog
sudo dynolog --flagfile=/etc/dynolog.gflags &


dyno gputrace \
  --log-file /shared/user/profiling/dynolog/4/libkeneto.pt.json \
  --profile-memory \
  --record-shapes \
  --with-stacks \
  --with-flops \
  --with-modules \
  --duration-ms 10000 \
  --process-limit 5

ACTIVITIES_LOG_FILE=/shared/user/profiling/dynolog/4/libkeneto.pt.json
PROFILE_START_TIME=0
ACTIVITIES_DURATION_MSECS=10000
PROFILE_REPORT_INPUT_SHAPES=true
PROFILE_PROFILE_MEMORY=true
PROFILE_WITH_STACK=true
PROFILE_WITH_FLOPS=true
PROFILE_WITH_MODULES=true
response length = 165
response = {"activityProfilersBusy":0,"activityProfilersTriggered":[6,90,355,389,390],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[6,90,355,389,390]}
Matched 5 processes
Trace output files will be written to:
    /shared/user/profiling/dynolog/4/libkeneto.pt_6.json
    /shared/user/profiling/dynolog/4/libkeneto.pt_90.json
    /shared/user/profiling/dynolog/4/libkeneto.pt_355.json
    /shared/user/profiling/dynolog/4/libkeneto.pt_389.json
    /shared/user/profiling/dynolog/4/libkeneto.pt_390.json

However, after some time, only 3 trace files are written of the uninteresting CPU launcher processes:

jobuser [ /shared/user/profiling/dynolog/4 ]$ ls -al
total 1810
drwxrwxr-x 2 jobuser jobuser    4096 Aug 11 05:56 .
drwxrwxr-x 6 jobuser jobuser    4096 Aug 11 05:54 ..
-rw-r--r-- 1 jobuser jobuser 1805724 Aug 11 05:56 libkeneto.pt_355.json
-rw-r--r-- 1 jobuser jobuser   12166 Aug 11 05:56 libkeneto.pt_6.json
-rw-r--r-- 1 jobuser jobuser   15462 Aug 11 05:56 libkeneto.pt_90.json

Process Details

  • PID 6: pyflyte-fast-execute (CPU launcher) ✅ Profiled
  • PID 90: pyflyte-execute (CPU launcher) ✅ Profiled
  • PID 355: accelerate launch (CPU launcher) ✅ Profiled
  • PID 389: training.py (GPU training process) ❌ Not profiled
  • PID 390: training.py (GPU training process) ❌ Not profiled

Any ideas why this is not profiling the actual training processes?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions