Skip to content

Conversation

@koushikbillakanti-amd
Copy link
Contributor

Motivation

This PR adds support for parsing per-GPU KFD process files that use the _NNNN suffix naming convention (e.g., vram_46775, stats_46775/).
The ROCm kernel driver exposes per-GPU process statistics in /sys/class/kfd/kfd/proc/PID/ with GPU-specific file suffixes.
Without this change, AMD-SMI cannot read process memory usage and statistics on systems using this KFD file structure.

Technical Details

Modified src/rocm_smi_kfd.cc and rocm_smi_kfd.cc to parse files with _NNNN GPU ID suffixes.
Added logic to read cu_occupancy and evicted_ms from stats_NNNN/ subdirectories under each process.
The implementation maintains backward compatibility with older KFD structures while supporting the new per-GPU naming convention.

JIRA ID

Resolves [SWDEV-545128]

Test Plan

Built AMD-SMI from source and ran amd-smi process command with active GPU workloads.
Verified KFD directory structure parsing by inspecting /sys/class/kfd/kfd/proc/PID/ contents.
Tested both text and JSON output formats to confirm correct data retrieval.

Test Result

Process detection working correctly - PID, process name, and memory usage displayed.
Per-GPU stats (cu_occupancy=0, evicted_ms=0) read successfully from stats_46775/ directory.
All tests passed on MI300A GPU with KFD_ID 46775.

Submission Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants