Skip to content
Open
Show file tree
Hide file tree
Changes from 150 commits
Commits
Show all changes
154 commits
Select commit Hold shift + click to select a range
cc44bee
Initial commit
DebashisGanguly Feb 14, 2018
2e65599
Starting point is the dev branch of original gpgpu-sim distribution.
DebashisGanguly Feb 14, 2018
afe053a
Making GPGPU-Sim work with CUDA 8.0
DebashisGanguly Feb 14, 2018
f2b9bca
UVM implementation leveraging existing gpgpu-sim memory management
DebashisGanguly Feb 14, 2018
17e99e5
Adding benchmark to add two vectors
DebashisGanguly Feb 14, 2018
b3355c2
Restructuring benchmark
DebashisGanguly May 9, 2018
f5f77d4
Adding few mini benchmarks from rodinia suit, need to add them to Man…
DebashisGanguly May 11, 2018
7d14fcd
Adding input data files for the benchmarks
DebashisGanguly May 11, 2018
161107a
backprop, kmeans and pathfinder are missing from Managed
DebashisGanguly May 25, 2018
f708eb3
Timing Simulation: Part 1
DebashisGanguly Jun 2, 2018
84ab23e
Revert "Timing Simulation: Part 1"
DebashisGanguly Jun 3, 2018
02bb217
Syncing with dev branch from Dec8, 2017 to today - on behalf of Ziyu …
DebashisGanguly Jun 13, 2018
d03c23a
Basic methods to implement page table
DebashisGanguly Jun 13, 2018
fa4babb
Differentiate managed and unmanaged page allocations and also enforce…
DebashisGanguly Jun 22, 2018
373e5b2
Timing simulation for on-demand paging for UVM
RonianZ Jun 26, 2018
6115ba2
Page eviction with LRU implementation
RonianZ Jul 10, 2018
b09ac52
TLB size restriction and LRU replacement of TLB entries
DebashisGanguly Jul 11, 2018
1c86565
Bug Fix: Unmanaged benchmarks stalls forever
RonianZ Jul 24, 2018
8bd2a73
Bug Fix: TLB invalidation was not registered
RonianZ Jul 24, 2018
392fef0
Benchmark changes:
DebashisGanguly Jul 24, 2018
7f09ae7
Updating managed allocation to hold gpu memory pointer, allocation si…
DebashisGanguly Jul 25, 2018
0f4af65
Copy data initialized by CPU to GPU only at the first kernel launch
DebashisGanguly Jul 25, 2018
7fc5791
Writing back the data of a dirty page to the host upon eviction from …
DebashisGanguly Jul 25, 2018
65fd105
Bug fixes and enabling output print in benchmarks
DebashisGanguly Jul 25, 2018
c65b185
During device synchronization copy only dirty pages for all managed a…
RonianZ Jul 25, 2018
50ba173
Jump the simulator clock to the future when requests waiting in PCI-E…
RonianZ Jul 26, 2018
12d9ad0
Bug Fix: while copying back dirty pages form GPU to CPU on page evict…
RonianZ Jul 26, 2018
cdc3b88
Managed backprop benchmark
RonianZ Jul 26, 2018
fbbd3f9
Bug Fix: jump cycle code
DebashisGanguly Jul 26, 2018
5c256c5
Removing kmeans benchmark as it uses texture memory and cannot be tra…
DebashisGanguly Aug 1, 2018
7973c94
new stencil benchmark from Parboil both Unmanaged and Managed version
RonianZ Aug 2, 2018
f80a185
Bug Fix: stencil reading input file error
RonianZ Aug 3, 2018
3267037
Bug Fix: Allowing multiple warps to be issued per SM
DebashisGanguly Aug 8, 2018
3eef954
Reverting code for jumping scheduler clock in the future
DebashisGanguly Aug 8, 2018
6fd8d83
Bug Fix: Do not let warp to finish if it has outstanding managed memo…
RonianZ Aug 9, 2018
7ea8625
Bug Fix: Adding missing code for stall condition, access type, cache …
RonianZ Aug 9, 2018
f4b8121
Adding config file support for UVM
RonianZ Aug 10, 2018
b02d307
Bug Fix: Releasing scoreboard register dependency for load instructio…
RonianZ Aug 16, 2018
ef92f4d
Implementation of cudaMemPrefetchAsync
RonianZ Aug 30, 2018
3d0c8d9
Kernels (even when launched in stream other than zero) should block C…
DebashisGanguly Sep 1, 2018
4752dbf
Bug fixes:
RonianZ Sep 3, 2018
87e0c75
Implementing statistics for Unified Virtual Memory
RonianZ Sep 3, 2018
17baa79
Disable deadlock check for UVM as the far fetch latency is too big fo…
RonianZ Sep 3, 2018
849332b
Updating to default parameters for benchmarks suggested by the suite
RonianZ Sep 6, 2018
565171c
Fixing bugs with statistics and adding new statistics for page access…
RonianZ Sep 6, 2018
fc62107
Updating managed benchmarks to have user defined prefetches
RonianZ Sep 8, 2018
f7f02ee
Simplifying PCI-e transfer logic
RonianZ Sep 9, 2018
d71bd5b
Multiple changes related to simplification of PCI-e logic
RonianZ Sep 9, 2018
b8bdbb3
Enabling kernel launch latency in config
RonianZ Sep 9, 2018
5b3ee54
Updating benchmarks to run prefetches in multiple parallel streams
RonianZ Sep 12, 2018
e0c654f
Implementing hardware prefetcher
RonianZ Sep 13, 2018
75a52dc
Simplifying the hardware prefetch by creating hierarchical tree of 2M…
DebashisGanguly Sep 13, 2018
d6fbb8a
Updating timing calculation for cudaMemcpy and cudaDeviceSynchronize …
RonianZ Sep 14, 2018
44d6369
Adding profile entry for cudaDeviceSynchronize in UVM statistics
RonianZ Sep 14, 2018
7515200
Implementing page fault handling latency
RonianZ Sep 14, 2018
ef82f75
Misc changes:
RonianZ Sep 14, 2018
7c62de1
Misc:
RonianZ Sep 15, 2018
19dcacd
Updating random eviction policy to evict even from hardware prefetche…
RonianZ Sep 17, 2018
2c244ea
Revert "Updating random eviction policy to evict even from hardware p…
DebashisGanguly Sep 23, 2018
d68a864
Reducing problem size for backprop, srad, and stencil such that they …
RonianZ Sep 23, 2018
b59af3f
Fixing bug with ready cycle calculation for 2MB page transfer
RonianZ Sep 23, 2018
fbdb9d5
Updating run file for srad and backprop to control disabling hardware…
RonianZ Sep 25, 2018
ef644aa
Adding python scripts to run benchmarks, extract data, and plot graph…
RonianZ Sep 25, 2018
bb7de2a
Added log file, updated plot, and script for validation experiment
RonianZ Sep 26, 2018
192e696
Simulating NVLINK 2.0 protocol with remote direct memory access
RonianZ Sep 26, 2018
8397fe9
Adding plots, script, and logs for RDMA experiment
RonianZ Sep 27, 2018
aa59e7b
Removing wrong logs from RDMA logs
DebashisGanguly Sep 27, 2018
213ec0b
Adding new plot for RDMA
DebashisGanguly Sep 27, 2018
5861b9e
Adding new plot for RDMA experiment
RonianZ Sep 27, 2018
7bda863
Removing results from earlier repo
DebashisGanguly Oct 2, 2018
331a5c6
MISC Changes:
DebashisGanguly Oct 2, 2018
585f843
Restructured hardware prefetcher logic
DebashisGanguly Oct 2, 2018
9424090
MISC Bug Fixes:
DebashisGanguly Oct 5, 2018
a97346d
Fixing bug with page eviction policy
DebashisGanguly Oct 6, 2018
c986c5f
Fixing bugs with spatio-temporal eviction and spatio-temporal preftec…
DebashisGanguly Oct 30, 2018
4c86cd0
Bug Fix: Check before removing from the list of reserved pages under …
DebashisGanguly Oct 31, 2018
bb88c50
Bug Fix: For sequential-local eviction trying to process beyond the l…
DebashisGanguly Oct 31, 2018
36035ae
Multiple bug fixes:
DebashisGanguly Nov 4, 2018
3860d9b
Multiple bug fix:
DebashisGanguly Nov 5, 2018
0744370
Bug Fix: silly error with free page buffer percentage calculation
DebashisGanguly Nov 11, 2018
e19418e
- Correcting spelling errors for statistics
DebashisGanguly Nov 15, 2018
8eac474
Stalling kernel for unfinished pcie transfers in both directions
DebashisGanguly Nov 15, 2018
4419997
Rewriting prefetcher and eviction procedure
DebashisGanguly Nov 15, 2018
bdbf2f9
Checking if the prefetech blocks are already staged for read transfer…
DebashisGanguly Nov 15, 2018
3b68d85
Some spelling errors with config items
DebashisGanguly Nov 15, 2018
5fcab0d
Silly error with parentheses placement
DebashisGanguly Nov 16, 2018
e9ccd10
removing redundant check for is_basic_block_evictable from update_bas…
DebashisGanguly Nov 16, 2018
1cbcf2e
Fixing error with scheduling duplicate read transfers
DebashisGanguly Nov 16, 2018
ccc09f1
Updating valid size for random and on-demand prefetchers and random a…
DebashisGanguly Nov 16, 2018
7aab8bc
Fixing error with random prefetcher
DebashisGanguly Nov 17, 2018
031156e
Round up allocation size so that even when hardware prefetcher is dis…
DebashisGanguly Nov 18, 2018
8e3061b
Changing config items semantics for free page buffer and lru reservation
DebashisGanguly Nov 22, 2018
dbe797a
Output logs and plots from experiments for Eviction Policy and Hardwa…
DebashisGanguly Nov 25, 2018
df19662
Update README to have steps for setting up environment
DebashisGanguly Dec 13, 2018
6f84e52
Implementing 2MB page eviction
DebashisGanguly Jan 26, 2019
8b21d06
Implementing LFU page eviction using access counter
DebashisGanguly Jan 27, 2019
f4ef18a
Changes to config and removing nvlink bandwidth
DebashisGanguly Feb 4, 2019
986c856
Introducing back 4KB LRU eviction for comparison
DebashisGanguly Feb 13, 2019
ca111c8
Updating working set so that 2MB eviction works
DebashisGanguly Feb 15, 2019
eb81ac3
Adding missing input file for updated bfs working set
DebashisGanguly Feb 15, 2019
5acc232
Reorder lru list for SL and TBN eviction, first group by 2MB large pa…
DebashisGanguly Feb 17, 2019
09670a0
Fixing bug with TBN prefetcher
DebashisGanguly Feb 19, 2019
df2d1e4
Updating parameters again (hoping this is the final working set; sten…
DebashisGanguly Feb 19, 2019
18962ab
bad line remove
DebashisGanguly Feb 19, 2019
22ce4b8
fix the max addressable global mem size as 1GB instead of dynamically…
DebashisGanguly Feb 21, 2019
c919911
Updating printing method for access pattern
DebashisGanguly Feb 23, 2019
ab2e2c3
Multiple changes:
DebashisGanguly Mar 21, 2019
72bd006
Adding first cut of FDTD-2D on behalf of Ronian
DebashisGanguly Mar 21, 2019
72e1296
Fixing minor bug with should cause page migration
DebashisGanguly Mar 22, 2019
d9f6756
Misc changes:
DebashisGanguly Mar 24, 2019
c8c883f
Increasing the size of access counter and changing the DMA latency
DebashisGanguly Mar 28, 2019
a513576
Misc changes:
DebashisGanguly Mar 31, 2019
81f06f4
Updating run parameters for nw
DebashisGanguly Apr 1, 2019
6477d4a
Adding managed version of random access benchmark adopted from HPC Ch…
DebashisGanguly Apr 5, 2019
a5d5e99
Unmanaged version of RandomAccess
DebashisGanguly Apr 9, 2019
99da10b
Misc changes:
DebashisGanguly Apr 12, 2019
1c1adf0
Removing bad assert
DebashisGanguly Apr 12, 2019
a906258
Fixing bug with random 4K eviction after code refactoring
DebashisGanguly Apr 13, 2019
4ca5f42
Updating raw logs, scripts, and plots for eviction prefetcher interplay
DebashisGanguly Apr 17, 2019
93905f9
MISC Changes: 1. Removing unnecessary log folder, 2. Adding Dockerfil…
DebashisGanguly Apr 17, 2019
e1fa9be
Update COPYRIGHT
DebashisGanguly Apr 17, 2019
fc308d8
MISC CHANGES: 1. Update README, 2. Add micro-benchmzarks, 3. Clean up…
DebashisGanguly Apr 19, 2019
390683a
Merge branch 'master' of https://github.com/DebashisGanguly/gpgpu-sim…
DebashisGanguly Apr 19, 2019
da1c931
Updating scripts and plots
DebashisGanguly Apr 30, 2019
50a28b7
Yet another update to plots and scripts
DebashisGanguly May 2, 2019
56e0998
Hopefully final update for ISCA
DebashisGanguly May 4, 2019
487be67
Update README after IPDPS acceptance correcting minor errors
DebashisGanguly Dec 12, 2019
2be17e0
Adding raw logs, scripts, and plots for IPDPS
DebashisGanguly May 10, 2020
3a69c3d
Adding 3 new managed benchmarks
DebashisGanguly Jun 15, 2020
fd6b2a2
Data structures and primitives for access pattern detection
DebashisGanguly Jun 23, 2020
e0a9af5
Complete implementation of pattern detection
DebashisGanguly Jun 25, 2020
985f920
Adding working set details for new benchmarks
DebashisGanguly Jun 26, 2020
2b36e3d
Removing print statement and test code
DebashisGanguly Jun 26, 2020
967dd1b
Removing unnecessary assert
DebashisGanguly Jun 26, 2020
32a5e88
Enabling policy making and adaptive memory management
DebashisGanguly Jun 27, 2020
2a4a8d0
Fixing minor index error with srad benchmark
DebashisGanguly Jul 1, 2020
5601037
Adding results (logs and extracted spreadsheet)
DebashisGanguly Jul 1, 2020
cfd6f10
Updating paper reference
DebashisGanguly Jul 17, 2020
b80fb10
Misc changes: 1. update config, 2. removal of validation results
DebashisGanguly Sep 29, 2020
778dbff
Minor fix to run on aalp cluster
Nov 8, 2020
25561e9
Formatting
Nov 11, 2020
448ddfb
Remove content of lib
Nov 11, 2020
bafab5d
Updating README with initial DATE'21 acceptance notification
DebashisGanguly Nov 16, 2020
698973a
Fixed include sequence to run on cluster
Dec 11, 2020
5f358e1
Merge branch 'DebashisGanguly:master' into master
yechen3 Mar 4, 2025
6b30568
A big commit that works with latest gpgpu-sim 4.0
yechen3 Mar 7, 2025
76a82f0
Remove unnessary benchmark folder
yechen3 Mar 7, 2025
b77cd21
Merge remote-tracking branch 'uvmsmart/master' into merge-uvmsmart
yechen3 Mar 7, 2025
863b2d9
Fix some bugs after merge
yechen3 Mar 7, 2025
1a0c1e7
Resolve merge conflicts in configs folder
yechen3 Mar 7, 2025
70f0c29
Remove stale files
yechen3 Mar 17, 2025
81e4d21
Change to UVA mode (always hit in TLB)
yechen3 Apr 23, 2025
0500eab
Fix the cicurlar dependency issue
yechen3 Apr 23, 2025
514c71b
Merge branch 'dev' into uvmsmart
yechen3 Apr 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions Jenkinsfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
pipeline {
agent {
label "purdue-cluster"
}

options {
disableConcurrentBuilds()
}
stages {
/*
stage('formatting-check') {
steps {
sh '''
source ./env-setup/common/export_gcc_version.sh 5.3.0
git remote add upstream https://github.com/purdue-aalp/gpgpu-sim_distribution
git fetch upstream
if git diff --name-only upstream/dev | grep -E "*.cc|*.h|*.cpp|*.hpp" ; then
git diff --name-only upstream/dev | grep -E "*.cc|*.h|*.cpp|*.hpp" | xargs ./run-clang-format.py --clang-format-executable /home/tgrogers-raid/a/common/clang-format/6.0.1/clang-format
fi
'''
}
}
*/
stage('env-setup') {
steps {
sh 'rm -rf env-setup && git clone [email protected]:purdue-aalp/env-setup.git &&\
cd env-setup && git checkout cluster-ubuntu'
}
}
stage('simulator-build') {
steps {
sh '''#!/bin/bash
source ./env-setup/11.0_env_setup.sh
source `pwd`/setup_environment
make -j 10'''
}
}
stage('simulations-build'){
steps{
sh 'rm -rf gpgpu-sim_simulations'
sh 'git clone [email protected]:purdue-aalp/gpgpu-sim_simulations.git && \
cd gpgpu-sim_simulations && \
git pull && \
ln -s /home/tgrogers-raid/a/common/data_dirs benchmarks/'
sh '''#!/bin/bash
source ./env-setup/11.0_env_setup.sh
source `pwd`/setup_environment
cd gpgpu-sim_simulations
source ./benchmarks/src/setup_environment
make -j 10 -C ./benchmarks/src/ rodinia_2.0-ft
make -C ./benchmarks/src data'''
}
}
stage('11.0 UVM Regressions'){
steps {
sh '''#!/bin/bash
source ./env-setup/11.0_env_setup.sh
source `pwd`/setup_environment
./gpgpu-sim_simulations/util/job_launching/run_simulations.py -B rodinia_2.0-ft -C GTX1080Ti_UVM -N regress-UVM-$$
PLOTDIR="jenkins/${JOB_NAME}/${BUILD_NUMBER}/11.0" && ssh [email protected] mkdir -p /home/dynamo/a/tgrogers/website/gpgpu-sim-plots/$PLOTDIR
./gpgpu-sim_simulations/util/job_launching/monitor_func_test.py -v -s stats-per-app-11.0.csv -N regress-UVM-$$'''
}
}
stage('11.0 Regular Regressions'){
steps {
sh '''#!/bin/bash
source ./env-setup/11.0_env_setup.sh
source `pwd`/setup_environment
./gpgpu-sim_simulations/util/job_launching/run_simulations.py -B rodinia_2.0-ft -C QV100 -N regress-$$
PLOTDIR="jenkins/${JOB_NAME}/${BUILD_NUMBER}/11.0" && ssh [email protected] mkdir -p /home/dynamo/a/tgrogers/website/gpgpu-sim-plots/$PLOTDIR
./gpgpu-sim_simulations/util/job_launching/monitor_func_test.py -v -s stats-per-app-11.0.csv -N regress-$$'''
}
}
stage('correlate-delta-and-archive') {
steps {
sh './gpgpu-sim_simulations/run_hw/get_hw_data.sh'
sh 'rm -rf ./gpgpu-sim_simulations/util/plotting/correl-html && rm -rf gpgpu-sim-results-repo && rm -rf ./gpgpu-sim_simulations/util/plotting/htmls'
sh 'git clone [email protected]:purdue-aalp/gpgpu-sim-results-repo.git'
sh '''#!/bin/bash
source ./env-setup/11.0_env_setup.sh
./gpgpu-sim_simulations/util/job_launching/get_stats.py -R -K -k -B rodinia_2.0-ft -C QV100 -A > stats-per-kernel-11.0.csv'''
sh 'if [ ! -d ./gpgpu-sim-results-repo/${JOB_NAME} ]; then mkdir -p ./gpgpu-sim-results-repo/${JOB_NAME}/ ; cp ./gpgpu-sim-results-repo/purdue-aalp/gpgpu-sim_distribution/dev/* ./gpgpu-sim-results-repo/${JOB_NAME}/ ; fi'
sh './gpgpu-sim_simulations/util/plotting/merge-stats.py -c ./gpgpu-sim-results-repo/${JOB_NAME}/stats-per-app-11.0.csv,./stats-per-app-11.0.csv -R > per-app-merge-11.0.csv'
sh 'PLOTDIR="jenkins/${JOB_NAME}" &&\
./gpgpu-sim_simulations/util/plotting/plot-get-stats.py -c per-app-merge-11.0.csv -P cuda-11.0 &&\
./gpgpu-sim_simulations/util/plotting/merge-stats.py -c ./gpgpu-sim-results-repo/${JOB_NAME}/stats-per-kernel-11.0.csv,./stats-per-kernel-11.0.csv -R > per-kernel-merge-11.0.csv &&\
./gpgpu-sim_simulations/util/plotting/plot-correlation.py -H ./gpgpu-sim_simulations/run_hw/QUADRO-V100/device-0/9.1/ -c per-kernel-merge-11.0.csv -p cuda-11.0 | grep -B 1 "Correl=" | tee correl.11.0.txt &&\
mkdir -p ./gpgpu-sim-results-repo/${JOB_NAME}/ && cp stats-per-*.csv ./gpgpu-sim-results-repo/${JOB_NAME}/ &&\
cd ./gpgpu-sim-results-repo &&\
git diff --quiet && git diff --staged --quiet || git commit -am "Jenkins automated checkin ${JOB_NAME} Build:${BUILD_NUMBER}" &&\
git push'

sh 'PLOTDIR="/home/dynamo/a/tgrogers/website/gpgpu-sim-plots/jenkins/${JOB_NAME}" &&\
ssh [email protected] mkdir -p $PLOTDIR/${BUILD_NUMBER} && \
scp ./gpgpu-sim_simulations/util/plotting/correl-html/* [email protected]:$PLOTDIR/${BUILD_NUMBER} &&\
scp ./gpgpu-sim_simulations/util/plotting/htmls/* [email protected]:$PLOTDIR/${BUILD_NUMBER} &&\
ssh [email protected] "cd $PLOTDIR && rm -rf latest && cp -r ${BUILD_NUMBER} latest"'
}
}
stage('sst-core-build') {
steps {
sh 'rm -rf sstcore-install'
sh 'rm -rf sst-core && git clone [email protected]:sstsimulator/sst-core.git'
sh '''#!/bin/bash
cd sst-core
./autogen.sh
./configure --prefix=`realpath ../sstcore-install` --disable-mpi --disable-mem-pools
make -j 10
make install'''
}
}
stage('sst-elements-build') {
steps {
sh 'rm -rf sstelements-install'
sh 'rm -rf sst-elements && git clone [email protected]:sstsimulator/sst-elements.git'
// First sourcing the env_setup and setup_environment script for env vars
sh '''#!/bin/bash
source ./env-setup/11.0_env_setup.sh
source `pwd`/setup_environment
cd sst-elements
./autogen.sh
./configure --prefix=`realpath ../sstelements-install` --with-sst-core=`realpath ../sstcore-install` --with-cuda=$CUDA_INSTALL_PATH --with-gpgpusim=$GPGPUSIM_ROOT
make -j 10
make install'''
}
}
stage('sst balar test') {
steps {
sh '''#!/bin/bash
source ./env-setup/11.0_env_setup.sh
source `pwd`/setup_environment sst
./sstcore-install/bin/sst-test-elements -p ./sst-elements/src/sst/elements/balar/tests'''
}
}
}
post {
success {
emailext body: "See ${BUILD_URL}.",
recipientProviders: [[$class: 'CulpritsRecipientProvider'],
[$class: 'RequesterRecipientProvider']],
subject: "[AALP Jenkins] Build ${JOB_NAME} #${BUILD_NUMBER} - Success!",
to: '[email protected]'
}
failure {
emailext body: "See ${BUILD_URL}",
recipientProviders: [[$class: 'CulpritsRecipientProvider'],
[$class: 'RequesterRecipientProvider']],
subject: "[AALP Jenkins] Build ${JOB_NAME} #${BUILD_NUMBER} - ${currentBuild.result}",
to: '[email protected]'
}
}
}
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,18 @@ Complex Dynamics in Many-Core Accelerator Architectures, In Proceedings of the
IEEE International Symposium on Performance Analysis of Systems and Software
(ISPASS), pp. 164-174, White Plains, NY, March 28-30, 2010.

If you use prefetchers and page eviction policies, please cite:

Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem, Interplay between hardware prefetcher and page eviction policy in CPU-GPU unified virtual memory, In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19), New York, NY, USA, 2019.

If you use access counter-based delayed migration, LFU eviction, cold vs hot data structure classification, and page migration and pinning, please cite:

Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem, Adaptive Page Migration for Irregular Data-intensive Applications under GPU Memory Oversubscription, In Proceedings of the 34th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2020), New Orleans, Louisiana, USA, 2020.

If you use adaptive runtime to detect pattern in CPU-GPU interconnect traffic, and policy engine to choose and dynamically employ memory management policies, please cite:

Debashis Ganguly, Rami Melhem, and Jun Yang, An Adaptive Framework for Oversubscription Management in CPU-GPU Unified Memory, In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE 2021).

This file contains instructions on installing, building and running GPGPU-Sim.
Detailed documentation on what GPGPU-Sim models, how to configure it, and a
guide to the source code can be found here: <http://gpgpu-sim.org/manual/>.
Expand Down
15 changes: 15 additions & 0 deletions bitbucket-pipelines.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# This is a sample build configuration for C++ – Make.
# Check our guides at https://confluence.atlassian.com/x/5Q4SMw for more examples.
# Only use spaces to indent your .yml configuration.
# -----
# You can specify a custom docker image from Docker Hub as your build environment.
image: tgrogers/gpgpu-sim_regress:latest

pipelines:
default:
- step:
script: # Modify the commands below to build your repository.
- docker run -v `pwd`:/home/runner/gpgpu-sim_distribution:rw tgrogers/gpgpu-sim_regress:latest /bin/bash -c "./start_torque.sh; chown -R runner /home/runner/gpgpu-sim_distribution; su - runner -c 'source /home/runner/gpgpu-sim_distribution/setup_environment && make -j -C /home/runner/gpgpu-sim_distribution && cd /home/runner/gpgpu-sim_simulations/ && git pull && /home/runner/gpgpu-sim_simulations/util/job_launching/run_simulations.py -c /home/runner/gpgpu-sim_simulations/util/job_launching/regression_recipies/rodinia_2.0-ft/configs.gtx1080ti.yml -N regress && /home/runner/gpgpu-sim_simulations/util/job_launching/monitor_func_test.py -v -N regress'"
services:
- docker

70 changes: 70 additions & 0 deletions configs/GTX480/config_fermi_islip.icnt
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
//21*1 fly with 32 flits per packet under gpgpusim injection mode
use_map = 0;
flit_size = 32;

// currently we do not use this, see subnets below
network_count = 2;

// Topology
topology = fly;
k = 27;
n = 1;

// Routing

routing_function = dest_tag;

// Flow control

num_vcs = 1;
vc_buf_size = 8;

wait_for_tail_credit = 0;

// Router architecture

vc_allocator = islip; //separable_input_first;
sw_allocator = islip; //separable_input_first;
alloc_iters = 1;

credit_delay = 0;
routing_delay = 0;
vc_alloc_delay = 1;
sw_alloc_delay = 1;

input_speedup = 2;
output_speedup = 1;
internal_speedup = 1.0;

// Traffic, GPGPU-Sim does not use this

traffic = uniform;
packet_size ={{1,2,3,4},{10,20}};
packet_size_rate={{1,1,1,1},{2,1}};

// Simulation - Don't change

sim_type = gpgpusim;
//sim_type = latency;
injection_rate = 0.1;

subnets = 2;

// Always use read and write no matter following line
//use_read_write = 1;


read_request_subnet = 0;
read_reply_subnet = 1;
write_request_subnet = 0;
write_reply_subnet = 1;

read_request_begin_vc = 0;
read_request_end_vc = 0;
write_request_begin_vc = 0;
write_request_end_vc = 0;
read_reply_begin_vc = 0;
read_reply_end_vc = 0;
write_reply_begin_vc = 0;
write_reply_end_vc = 0;

133 changes: 133 additions & 0 deletions configs/GTX480/gpgpusim.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# functional simulator specification
-gpgpu_ptx_instruction_classification 0
-gpgpu_ptx_sim_mode 0
-gpgpu_ptx_force_max_capability 20


# SASS execution (only supported with CUDA >= 4.0)
-gpgpu_ptx_convert_to_ptxplus 0
-gpgpu_ptx_save_converted_ptxplus 0

# high level architecture configuration
-gpgpu_n_clusters 15
-gpgpu_n_cores_per_cluster 1
-gpgpu_n_mem 6
-gpgpu_n_sub_partition_per_mchannel 2

# Fermi clock domains
#-gpgpu_clock_domains <Core Clock>:<Interconnect Clock>:<L2 Clock>:<DRAM Clock>
# In Fermi, each pipeline has 16 execution units, so the Core clock needs to be divided
# by 2. (GPGPU-Sim simulates a warp (32 threads) in a single cycle). 1400/2 = 700
-gpgpu_clock_domains 700.0:700.0:700.0:924.0

# shader core pipeline config
-gpgpu_shader_registers 32768

# This implies a maximum of 48 warps/SM
-gpgpu_shader_core_pipeline 1536:32
-gpgpu_shader_cta 8
-gpgpu_simd_model 1

# Pipeline widths and number of FUs
# ID_OC_SP,ID_OC_SFU,ID_OC_MEM,OC_EX_SP,OC_EX_SFU,OC_EX_MEM,EX_WB
-gpgpu_pipeline_widths 2,1,1,2,1,1,2
-gpgpu_num_sp_units 2
-gpgpu_num_sfu_units 1

# Instruction latencies and initiation intervals
# "ADD,MAX,MUL,MAD,DIV"
-ptx_opcode_latency_int 4,13,4,5,145
-ptx_opcode_initiation_int 1,2,2,1,8
-ptx_opcode_latency_fp 4,13,4,5,39
-ptx_opcode_initiation_fp 1,2,1,1,4
-ptx_opcode_latency_dp 8,19,8,8,330
-ptx_opcode_initiation_dp 8,16,8,8,130


# In Fermi, the cache and shared memory can be configured to 16kb:48kb(default) or 48kb:16kb
# <nsets>:<bsize>:<assoc>,<rep>:<wr>:<alloc>:<wr_alloc>:<set_index_fn>,<mshr>:<N>:<merge>,<mq>:**<fifo_entry>
# ** Optional parameter - Required when mshr_type==Texture Fifo
# Note: Hashing set index function (H) only applies to a set size of 32 or 64.
-gpgpu_cache:dl1 32:128:4,L:L:m:N:H,A:32:8,8
-gpgpu_shmem_size 49152

# The alternative configuration for fermi in case cudaFuncCachePreferL1 is selected
#-gpgpu_cache:dl1 64:128:6,L:L:m:N:H,A:32:8,8
#-gpgpu_shmem_size 16384

# 64 sets, each 128 bytes 8-way for each memory sub partition. This gives 786KB L2 cache
-gpgpu_cache:dl2 64:128:8,L:B:m:W:L,A:32:4,4:0,32
-gpgpu_cache:dl2_texture_only 0

-gpgpu_cache:il1 4:128:4,L:R:f:N:L,A:2:32,4
-gpgpu_tex_cache:l1 4:128:24,L:R:m:N:L,F:128:4,128:2
-gpgpu_const_cache:l1 64:64:2,L:R:f:N:L,A:2:32,4

# enable operand collector
-gpgpu_operand_collector_num_units_sp 6
-gpgpu_operand_collector_num_units_sfu 8
-gpgpu_operand_collector_num_in_ports_sp 2
-gpgpu_operand_collector_num_out_ports_sp 2
-gpgpu_num_reg_banks 16

# shared memory bankconflict detection
-gpgpu_shmem_num_banks 32
-gpgpu_shmem_limited_broadcast 0
-gpgpu_shmem_warp_parts 1

-gpgpu_max_insn_issue_per_warp 1

# interconnection
-network_mode 1
-inter_config_file config_fermi_islip.icnt

# memory partition latency config
-rop_latency 120
-dram_latency 100

# dram model config
-gpgpu_dram_scheduler 1
# The DRAM return queue and the scheduler queue together should provide buffer
# to sustain the memory level parallelism to tolerate DRAM latency
# To allow 100% DRAM utility, there should at least be enough buffer to sustain
# the minimum DRAM latency (100 core cycles). I.e.
# Total buffer space required = 100 x 924MHz / 700MHz = 132
-gpgpu_frfcfs_dram_sched_queue_size 16
-gpgpu_dram_return_queue_size 116

# for Fermi, bus width is 384bits, this is 8 bytes (4 bytes at each DRAM chip) per memory partition
-gpgpu_n_mem_per_ctrlr 2
-gpgpu_dram_buswidth 4
-gpgpu_dram_burst_length 8
-dram_data_command_freq_ratio 4 # GDDR5 is QDR
-gpgpu_mem_address_mask 1
-gpgpu_mem_addr_mapping dramid@8;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS

# GDDR5 timing from hynix H5GQ1H24AFR
# to disable bank groups, set nbkgrp to 1 and tCCDL and tRTPL to 0
-gpgpu_dram_timing_opt "nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40:
CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2"

# Fermi has two schedulers per core
-gpgpu_num_sched_per_core 2
# Two Level Scheduler with active and pending pools
#-gpgpu_scheduler two_level_active:6:0:1
# Loose round robbin scheduler
#-gpgpu_scheduler lrr
# Greedy then oldest scheduler
-gpgpu_scheduler gto

# stat collection
-gpgpu_memlatency_stat 14
-gpgpu_runtime_stat 500
-enable_ptx_file_line_stats 1
-visualizer_enabled 0

# power model configs
-power_simulation_enabled 1
-gpuwattch_xml_file gpuwattch_gtx480.xml

# tracing functionality
#-trace_enabled 1
#-trace_components WARP_SCHEDULER,SCOREBOARD
#-trace_sampling_core 0
Loading
Loading