Skip to content

Commit 96d7923

Browse files
William-Anpurdue-jenkinsJRPan
authored
Spinlock detection and fastforwarding for accel-sim (#484)
* add spinlock detection tool * use dprintf for debug msg * integrate spinlock fastforwarding with accel-sim tracer * add custom rundir support for spinlock tool * add spinlock detection script to the run_hw_trace.py * Automated Format * track kernel histogram for every launch in every context by kernel name * update tracer tool with per-kernel histogram * format to pass ci * update test app * move test app to gpu-app-collection * update script for spinlock handling * update ci to include spinlock tracer run * add spinlock test app to accel-sim yaml * fix a bug when detecting spinlock * fix bug * fix filename too long issue and clean intermediate files by default * fix path issue * fix histogram path for merged histo and add readme * address PR review and update top-level readme * update CI for PR * build spinlock * clone recursively for building GPU ubench * remove sim compare for spinlock since it takes too long to complete * move spinlock tracer test to weekly and fix a bug in it --------- Co-authored-by: purdue-jenkins <[email protected]> Co-authored-by: JRPan <[email protected]>
1 parent 6c06571 commit 96d7923

File tree

17 files changed

+1403
-23
lines changed

17 files changed

+1403
-23
lines changed

.github/workflows/main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,7 @@ jobs:
291291
source ./env-setup/12.8_env_setup.sh
292292
source ./gpu-app-collection/src/setup_environment
293293
rm -rf ./hw_run/
294-
./util/tracer_nvbit/run_hw_trace.py -B rodinia_2.0-ft -D 7
294+
./util/tracer_nvbit/run_hw_trace.py -B rodinia_2.0-ft -D 7 --spinlock_handling none
295295
- name: generate-rodinia_2.0-ft-hw_stats
296296
run: |
297297
source ./env-setup/12.8_env_setup.sh

.github/workflows/weekly.yml

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
source ./env-setup/12.8_env_setup.sh
3535
export PATH=/home/tgrogers-raid/a/common/python2:$PATH
3636
rm -rf ./gpu-app-collection/
37-
git clone [email protected]:accel-sim/gpu-app-collection.git
37+
git clone --recursive [email protected]:accel-sim/gpu-app-collection.git
3838
source ./gpu-app-collection/src/setup_environment
3939
ln -s /home/tgrogers-raid/a/common/data_dirs ./gpu-app-collection/
4040
make -j8 -C ./gpu-app-collection/src rodinia-3.1
@@ -53,6 +53,24 @@ jobs:
5353
ln -s /scratch/tgrogers-disk01/a/common/for-sharing/$USER/nightly-traces ./hw_run
5454
./util/tracer_nvbit/run_hw_trace.py -B rodinia-3.1,GPU_Microbenchmark -D 7
5555
# ./util/tracer_nvbit/run_hw_trace.py -B rodinia-3.1,GPU_Microbenchmark,parboil,polybench,cutlass_5_trace,Deepbench_nvidia_tencore,Deepbench_nvidia_normal -D 7
56+
- name: generate-spinlock-traces-spinlock_handling
57+
run: |
58+
source ./env-setup/12.8_env_setup.sh
59+
source ./gpu-app-collection/src/setup_environment
60+
rm -rf ./hw_run/
61+
./util/tracer_nvbit/run_hw_trace.py -B Spinlock -D 7 --spinlock_handling fast_forward
62+
mv ./hw_run ./hw_run_fast_forward
63+
./util/tracer_nvbit/run_hw_trace.py -B Spinlock -D 7 --spinlock_handling none
64+
mv ./hw_run ./hw_run_none
65+
- name: test-new-traces-spinlock_handling
66+
# Test only fast-forwarded traces as the none one takes too long to run (~2-3 hr)
67+
run: |
68+
source ./env-setup/12.8_env_setup.sh
69+
source ./gpu-simulator/setup_environment.sh
70+
./util/job_launching/run_simulations.py -B Spinlock -C QV100-SASS -T ./hw_run_fast_forward/traces/device-7/ -N spinlock-microbenchmark-$$-fast_forward
71+
./util/job_launching/monitor_func_test.py -I -v -s spinlock-stats-per-app.csv -N spinlock-microbenchmark-$$-fast_forward
72+
# ./util/job_launching/run_simulations.py -B Spinlock -C QV100-SASS -T ./hw_run_none/traces/device-7/ -N spinlock-microbenchmark-$$-none
73+
# ./util/job_launching/monitor_func_test.py -I -v -s spinlock-stats-per-app.csv -N spinlock-microbenchmark-$$-none
5674
SASS-Weekly:
5775
needs: [Tracer-Weekly]
5876
if: github.repository == 'accel-sim/accel-sim-framework'

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,6 @@ gpu-simulator/gpgpu-sim
1313
extern
1414
gpu-simulator/accel_sim.pyi
1515
compile_commands.json
16-
.cache
16+
.cache
17+
.cursorrules
18+
CLAUDE.md

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- [Accel-Sim Components](#accel-sim-components)
1111
- [Accel-Sim Tracer](#accel-sim-tracer)
1212
- [A simple example](#a-simple-example)
13+
- [Spinlock handling](#spinlock-handling)
1314
- [Pre-traced applications](#pre-traced-applications)
1415
- [Accel-Sim SASS Frontend and Simulation Engine](#accel-sim-sass-frontend-and-simulation-engine)
1516
- [Accel-Sim Correlator](#accel-sim-correlator)
@@ -113,6 +114,18 @@ That's it. The traces for the short-running rodinia tests will be generated in:
113114

114115
To extend the tracer, use other apps and understand what, exactly is going on, read [this](https://github.com/accel-sim/accel-sim-framework/blob/dev/util/tracer_nvbit/README.md).
115116

117+
#### Spinlock handling
118+
119+
If your application contains spinlock instructions, you can handle them with the tracer by using the following command:
120+
121+
```bash
122+
./util/tracer_nvbit/run_hw_trace.py -B rodinia_2.0-ft -D <gpu-device-num-to-run-on> --spinlock_handling fast_forward
123+
```
124+
125+
This will fast forward the spinlock instructions and keep the spinlock instructions for the number of iterations specified in the `--spinlock_fast_forward_iterations` arg option.
126+
127+
The tool for spinlock detection is in `./util/tracer_nvbit/others/spinlock_tool/`.
128+
116129
#### Pre-traced applications
117130
For convience, we have included a repository of pre-traced applications - to get all those traces, simply run:
118131
```bash

util/job_launching/apps/define-all-apps.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,14 @@ GPU_Atomic:
126126
- args: 16
127127
accel-sim-mem: 1G
128128

129+
Spinlock:
130+
exec_dir: "$GPUAPPS_ROOT/bin/$CUDA_VERSION/release/"
131+
data_dirs: "$GPUAPPS_ROOT/data_dirs/"
132+
execs:
133+
- spinlock_simple:
134+
- args:
135+
accel-sim-mem: 1G
136+
129137
Atomic_Profile:
130138
exec_dir: "$GPUAPPS_ROOT/bin/$CUDA_VERSION/release/"
131139
data_dirs: "$GPUAPPS_ROOT/data_dirs/"

util/tracer_nvbit/.gitignore

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
11
nvbit_release/
2-
silicon_checkpoint_tool/checkpoint/checkpoint.o
3-
silicon_checkpoint_tool/checkpoint/checkpoint.so
4-
tracer_tool/tracer_tool.o
5-
tracer_tool/tracer_tool.so
6-
tracer_tool/inject_funcs.o
2+
*.o
3+
*.so
74
tracer_tool/traces-processing/post-traces-processing

util/tracer_nvbit/Makefile

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11

22
all:
3-
make -C tracer_tool
4-
make -C tracer_tool/traces-processing
5-
#make -C silicon_checkpoint_tool
3+
$(MAKE) -C tracer_tool
4+
$(MAKE) -C tracer_tool/traces-processing
5+
$(MAKE) -C others/spinlock_tool
6+
#$(MAKE) -C silicon_checkpoint_tool
67

78
clean:
8-
make clean -C tracer_tool
9-
make clean -C tracer_tool/traces-processing
9+
$(MAKE) clean -C tracer_tool
10+
$(MAKE) clean -C tracer_tool/traces-processing
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2017 NVIDIA CORPORATION & AFFILIATES.
2+
# All rights reserved.
3+
# SPDX-License-Identifier: BSD-3-Clause
4+
#
5+
# Redistribution and use in source and binary forms, with or without
6+
# modification, are permitted provided that the following conditions are met:
7+
#
8+
# 1. Redistributions of source code must retain the above copyright notice, this
9+
# list of conditions and the following disclaimer.
10+
#
11+
# 2. Redistributions in binary form must reproduce the above copyright notice,
12+
# this list of conditions and the following disclaimer in the documentation
13+
# and/or other materials provided with the distribution.
14+
#
15+
# 3. Neither the name of the copyright holder nor the names of its
16+
# contributors may be used to endorse or promote products derived from
17+
# this software without specific prior written permission.
18+
#
19+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20+
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22+
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23+
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24+
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25+
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26+
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27+
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29+
30+
NVCC=nvcc -ccbin=$(CXX) -D_FORCE_INLINES
31+
PTXAS=ptxas
32+
33+
NVCC_VER_REQ=10.1
34+
NVCC_VER=$(shell $(NVCC) --version | grep release | cut -f2 -d, | cut -f3 -d' ')
35+
NVCC_VER_CHECK=$(shell echo "${NVCC_VER} >= $(NVCC_VER_REQ)" | bc)
36+
37+
ifeq ($(NVCC_VER_CHECK),0)
38+
$(error ERROR: nvcc version >= $(NVCC_VER_REQ) required to compile an nvbit tool! Instrumented applications can still use lower versions of nvcc.)
39+
endif
40+
41+
PTXAS_VER_ADD_FLAG=12.3
42+
PTXAS_VER=$(shell $(PTXAS) --version | grep release | cut -f2 -d, | cut -f3 -d' ')
43+
PTXAS_VER_CHECK=$(shell echo "${PTXAS_VER} >= $(PTXAS_VER_ADD_FLAG)" | bc)
44+
45+
ifeq ($(PTXAS_VER_CHECK), 0)
46+
MAXRREGCOUNT_FLAG=-maxrregcount=24
47+
else
48+
MAXRREGCOUNT_FLAG=
49+
endif
50+
51+
NVBIT_PATH=../../nvbit_release/core
52+
INCLUDES=-I$(NVBIT_PATH)
53+
54+
LIBS=-L$(NVBIT_PATH) -lnvbit
55+
NVCC_PATH=-L $(subst bin/nvcc,lib64,$(shell which nvcc | tr -s /))
56+
57+
SOURCES=$(wildcard *.cu)
58+
59+
OBJECTS=$(SOURCES:.cu=.o)
60+
ARCH?=all
61+
62+
mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
63+
current_dir := $(notdir $(patsubst %/,%,$(dir $(mkfile_path))))
64+
65+
NVBIT_TOOL=$(current_dir).so
66+
67+
all: $(NVBIT_TOOL)
68+
69+
$(NVBIT_TOOL): $(OBJECTS) $(NVBIT_PATH)/libnvbit.a
70+
$(NVCC) -arch=$(ARCH) -O3 $(OBJECTS) $(LIBS) $(NVCC_PATH) -lcuda -lcudart_static -shared -o $@
71+
72+
%.o: %.cu
73+
$(NVCC) -dc -c -std=c++17 $(INCLUDES) -Xptxas -cloning=no -Xcompiler -Wall -arch=$(ARCH) -O3 -Xcompiler -fPIC $< -o $@
74+
75+
inject_funcs.o: inject_funcs.cu
76+
$(NVCC) $(INCLUDES) $(MAXRREGCOUNT_FLAG) -Xptxas -astoolspatch --keep-device-functions -arch=$(ARCH) -Xcompiler -Wall -Xcompiler -fPIC -c $< -o $@
77+
78+
clean:
79+
rm -f *.so *.o
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Spinlock tool
2+
3+
## Description
4+
5+
This tool is used to detect spinlocks in the kernel code.
6+
7+
## Usage
8+
9+
```bash
10+
# Run program first time to get the instruction histogram of the program's kernels
11+
SPINLOCK_PHASE=0 CUDA_INJECTION64_PATH=PATH/TO/spinlock_tool.so program
12+
13+
# Run program second time to get another instruction histogram of the program's kernels
14+
# At the end of nvbit, this tool will generate a file with the name of spinlock_detection/spinlock_instructions.txt
15+
# containing the instruction indices of the spinlock instructions in the program's kernels
16+
SPINLOCK_PHASE=1 CUDA_INJECTION64_PATH=PATH/TO/spinlock_tool.so program
17+
18+
# To fast forward the spinlock instructions with accel-sim tracer, you can use the following command
19+
ENABLE_SPINLOCK_FAST_FORWARD=1 CUDA_INJECTION64_PATH=PATH/TO/tracer_tool.so program
20+
```

0 commit comments

Comments
 (0)