Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/develop' into default_to_per_kernel
Browse files Browse the repository at this point in the history
  • Loading branch information
feizheng10 committed Feb 27, 2025
2 parents 152b26a + bec537e commit ee842b4
Show file tree
Hide file tree
Showing 29 changed files with 358 additions and 134 deletions.
2 changes: 2 additions & 0 deletions .azuredevops/rocm-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ trigger:
batch: true
branches:
include:
- develop
- amd-staging
- amd-mainline
paths:
Expand All @@ -29,6 +30,7 @@ pr:
autoCancel: true
branches:
include:
- develop
- amd-staging
- amd-mainline
paths:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/rhel-8.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ jobs:
steps:
- name: Install baseline OS dependencies
run: |
yum clean all
yum makecache
yum -y install git
yum -y install python39
yum -y install cmake3
Expand Down
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,8 @@ add_custom_target(
COMMAND ${Python3_EXECUTABLE} -m pip list | grep -i nuitka > /dev/null 2>&1
# Check patchelf
COMMAND ${Python3_EXECUTABLE} -m pip list | grep -i patchelf > /dev/null 2>&1
# Create VERSION.sha file
COMMAND git -C ${PROJECT_SOURCE_DIR} rev-parse HEAD > VERSION.sha
# Build standalone binary
COMMAND
${Python3_EXECUTABLE} -m nuitka --mode=onefile
Expand Down
5 changes: 4 additions & 1 deletion docker/Dockerfile.standalone
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@ FROM redhat/ubi8:8.10-1184

WORKDIR /app

RUN yum install -y curl gcc cmake
RUN yum install -y curl gcc cmake git

# Allows running git commands in /app
RUN git config --global --add safe.directory /app

RUN yum install -y python38 python38-devel && \
yum clean all && \
Expand Down
5 changes: 4 additions & 1 deletion docker/Dockerfile.test
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@ WORKDIR /app

# Update package list and install prerequisites
RUN apt-get update && apt-get install -y \
software-properties-common cmake locales \
software-properties-common cmake locales git \
&& add-apt-repository ppa:deadsnakes/ppa \
&& apt-get update

# Allows running git commands in /app
RUN git config --global --add safe.directory /app

# Generate the desired locale
RUN locale-gen en_US.UTF-8

Expand Down
24 changes: 23 additions & 1 deletion src/rocprof_compute_profile/profiler_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
import logging
import os
import re
import shutil
import sys
import time
from abc import ABC, abstractmethod
Expand Down Expand Up @@ -77,6 +78,26 @@ def join_prof(self, out=None):
out = self.__args.path + "/pmc_perf.csv"
files = glob.glob(self.__args.path + "/" + "pmc_perf_*.csv")
files.extend(glob.glob(self.__args.path + "/" + "SQ_*.csv"))

if self.get_args().hip_trace:
# remove hip api trace ouputs from this list
files = [
f
for f in files
if not re.compile(r"^.*_hip_api_trace\.csv$").match(
os.path.basename(f)
)
]

if self.get_args().kokkos_trace:
# remove marker api trace ouputs from this list
files = [
f
for f in files
if not re.compile(r"^.*_marker_api_trace\.csv$").match(
os.path.basename(f)
)
]
elif type(self.__args.path) == list:
files = self.__args.path
else:
Expand Down Expand Up @@ -266,7 +287,8 @@ def pre_processing(self):
# verify correct formatting for application binary
self.__args.remaining = self.__args.remaining[1:]
if self.__args.remaining:
if not Path(self.__args.remaining[0]).is_file():
# Ensure that command points to an executable
if not shutil.which(self.__args.remaining[0]):
console_error(
"Your command %s doesn't point to a executable. Please verify."
% self.__args.remaining[0]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ Panel Config:
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
pop: None # No perf counter
tips:
MFMA FLOPs (F8):
value: None # No HW module
unit: GFLOP
peak: None # No HW module
pop: None # No HW module
tips:
MFMA FLOPs (BF16):
value: None # No perf counter
unit: GFLOPs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,12 @@ Panel Config:
max: None # No HW module
unit: (instr + $normUnit)
tips:
MFMA-F8:
avg: None # No HW module
min: None # No HW module
max: None # No HW module None # No HW module
unit: (instr + $normUnit)
tips:
MFMA-F16:
avg: None # No HW module
min: None # No HW module
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,22 @@ Panel Config:
metric:
VALU FLOPs:
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
VALU IOPs:
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F8):
value: None # No perf counter
unit: GFLOP
peak: None # No perf counter
pop: None # No perf counter
tips:
MFMA FLOPs (BF16):
value: None # No perf counter
Unit: None
Expand All @@ -39,25 +45,25 @@ Panel Config:
tips:
MFMA FLOPs (F16):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F32):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F64):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
Expand Down Expand Up @@ -174,6 +180,12 @@ Panel Config:
max: None # No perf counter
unit: (OPs + $normUnit)
tips:
F8 OPs:
avg: None # No HW module
min: None # No HW module
max: None # No HW module
unit: (OPs + $normUnit)
tips:
F16 OPs:
avg: None # No perf counter
min: None # No perf counter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ Panel Config:
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
pop: None # No perf counter
tips:
MFMA FLOPs (F8):
value: None # No HW module
unit: GFLOP
peak: None # No HW module
pop: None # No HW module
tips:
MFMA FLOPs (BF16):
value: None # No perf counter
unit: GFLOPs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,13 @@ Panel Config:
unit: Unit
tips: Tips
metric:
INT-32:
INT32:
avg: None # No perf counter
min: None # No perf counter
max: None # No perf counter
unit: (instr + $normUnit)
tips:
INT-64:
INT64:
avg: None # No perf counter
min: None # No perf counter
max: None # No perf counter
Expand Down Expand Up @@ -241,6 +241,12 @@ Panel Config:
max: None # No HW module
unit: (instr + $normUnit)
tips:
MFMA-F8:
avg: None # No HW module
min: None # No HW module
max: None # No HW module None # No HW module
unit: (instr + $normUnit)
tips:
MFMA-F16:
avg: None # No HW module
min: None # No HW module
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,22 @@ Panel Config:
metric:
VALU FLOPs:
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
VALU IOPs:
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F8):
value: None # No perf counter
unit: GFLOP
peak: None # No perf counter
pop: None # No perf counter
tips:
MFMA FLOPs (BF16):
value: None # No perf counter
Unit: None
Expand All @@ -39,25 +45,25 @@ Panel Config:
tips:
MFMA FLOPs (F16):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F32):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA FLOPs (F64):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: None # No perf counter
Unit: None
unit: None
peak: None
pop: None
tips:
Expand Down Expand Up @@ -174,6 +180,12 @@ Panel Config:
max: None # No perf counter
unit: (OPs + $normUnit)
tips:
F8 OPs:
avg: None # No HW module
min: None # No HW module
max: None # No HW module
unit: (OPs + $normUnit)
tips:
F16 OPs:
avg: None # No perf counter
min: None # No perf counter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,12 @@ Panel Config:
pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
- Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
tips:
MFMA FLOPs (F8):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA FLOPs (BF16):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,12 @@ Panel Config:
max: MAX((SQ_INSTS_VALU_MFMA_I8 / $denom))
unit: (instr + $normUnit)
tips:
MFMA-F8:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
MFMA-F16:
avg: AVG((SQ_INSTS_VALU_MFMA_F16 / $denom))
min: MIN((SQ_INSTS_VALU_MFMA_F16 / $denom))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ Panel Config:
pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
- Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
tips:
MFMA FLOPs (F8):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA FLOPs (BF16):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
Expand Down Expand Up @@ -216,6 +222,12 @@ Panel Config:
max: MAX(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
unit: (OPs + $normUnit)
tips:
F8 OPs:
avg: None
min: None
max: None
unit: (OPs + $normUnit)
tips:
F16 OPs:
avg: AVG(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
(64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,13 @@ Panel Config:
pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
- Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
tips:
MFMA FLOPs (F8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
tips:
MFMA FLOPs (BF16):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
Expand Down Expand Up @@ -187,12 +194,14 @@ Panel Config:
/ ((($max_sclk / 1000) * 128) * TO_INT($total_l2_chan)))
tips:
L2-Fabric Read BW:
value: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
* 64)) / (End_Timestamp - Start_Timestamp)))
value: AVG((128 * TCC_BUBBLE_sum +
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
unit: GB/s
peak: $hbm_bw
pop: ((100 * AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
tips:
L2-Fabric Write BW:
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
Expand Down
Loading

0 comments on commit ee842b4

Please sign in to comment.