Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev btb rebase #308

Open
wants to merge 14 commits into
base: xs-dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,7 @@ llvm-pgo/
*.profraw
*nemu*
ready-to-run/
*.out
*.out
.cursorrules
.cursorignore
compile_commands.json
108 changes: 108 additions & 0 deletions README.cn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# XiangShan GEM5 模拟器

这是香山处理器的GEM5模拟器版本,目前在SPEC CPU 2006基准测试上与昆明湖处理器性能相当。

## 项目特点

XS-GEM5是专门为香山处理器定制的GEM5模拟器,相比官方GEM5:
- 仅支持全系统模拟(Full System Simulation)
- 支持香山特有的格式和功能
- 包含多个香山特有的功能增强

### 主要功能增强

1. 前端微架构校准
- 解耦前端设计
- TAGESC、ITTAGE和可选的Loop预测器
- 与昆明湖对齐的指令延迟校准

2. 后端微架构校准
- 分布式调度器
- 调度/执行延迟校准
- RVV向量扩展支持

3. 缓存层次结构优化
- 多种预取器算法:Stream + Berti/Stride + BOP + SMS + Temporal + CDP
- 主动/被动卸载框架
- 多预取器协调
- VA-PA转换支持

4. 其他特性
- 并行RV PTW(页表遍历)
- 级联FMA
- 移动消除
- L2 TLB和TLB预取
- CSR修复

## 目录结构

```
.
├── configs/ # 配置文件目录
│ ├── example/ # 示例配置
│ │ └── xiangshan.py # 香山处理器配置
│ └── common/ # 通用配置
├── src/ # 源代码目录
│ ├── arch/ # 架构相关代码
│ │ └── riscv/ # RISC-V架构实现
│ ├── cpu/ # CPU相关代码
│ │ ├── o3/ # 乱序执行实现
│ │ └── pred/ # 分支预测器
│ └── mem/ # 内存系统相关代码
├── system/ # 系统相关代码
├── util/ # 工具脚本
│ └── xs_scripts/ # 香山特有脚本
└── ext/ # 外部依赖
└── dramsim3/ # DRAMSim3内存模拟器
```

## 关键代码路径

### 1. 处理器配置
- `configs/example/xiangshan.py`: 香山处理器的基本配置
- `configs/common/XSConfig.py`: 香山特有的配置选项

### 2. CPU核心实现
- `src/cpu/o3/`: 乱序执行核心实现
- `cpu.cc`: CPU核心的主要实现
- `fetch.cc`: 取指单元
- `decode.cc`: 解码单元
- `rename.cc`: 重命名单元
- `dispatch.cc`: 分发单元
- `issue.cc`: 发射单元
- `execute.cc`: 执行单元
- `writeback.cc`: 写回单元

### 3. 分支预测器
- `src/cpu/pred/`: 分支预测器实现
- BTB-based:基于传统BTB的设计
- 详细内容查看BranchPredictor.py 中的DecoupledBPUWithBTB 模块

### 4. 内存系统
- `src/mem/`: 内存系统实现
- `cache.cc`: 缓存实现
- `prefetch/`: 预取器实现
- `page_table.cc`: 页表实现

### 5. RISC-V架构支持
- `src/arch/riscv/`: RISC-V架构实现
- `decoder.cc`: 指令解码器
- `registers.cc`: 寄存器实现
- `isa.cc`: 指令集实现

## 使用说明

### 环境要求
- Ubuntu 20.04/22.04
- Python 3.8(推荐使用conda环境)
- 其他依赖见README.md

### 构建步骤
1. 安装依赖
2. 克隆并构建DRAMSim3
3. 构建GEM5
4. 设置环境变量
5. 运行模拟器

详细步骤请参考README.md中的说明。

9 changes: 9 additions & 0 deletions SConstruct
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,15 @@ for variant_path in variant_paths:
env = main.Clone()
env['BUILDDIR'] = variant_path

# Enable compilation database generation for this variant
env.Tool('compilation_db')
env['COMPILATIONDB_USE_ABSPATH'] = True
# Create compilation database in the variant directory
cdb_path = os.path.join(variant_path, 'compile_commands.json')
print(f"Creating compilation database at: {cdb_path}")
cdb = env.CompilationDatabase(cdb_path)
print(f"Compilation database target created: {cdb}")

gem5_build = os.path.join(build_root, variant_path, 'gem5.build')
env['GEM5BUILD'] = gem5_build
Execute(Mkdir(gem5_build))
Expand Down
27 changes: 18 additions & 9 deletions configs/example/fs.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,15 +318,24 @@ def build_test_system(np):
test_sys.cpu[i].branchPred.tage.enableSC = not args.disable_sc
print("db_switches:", bp_db_switches)
else:
if enable_bp_db:
print("bpdb not supported for this branch predictor")
if args.enable_loop_buffer:
print("loop buffer not supported for this branch predictor")
if args.enable_loop_predictor:
print("loop predictor not supported for this branch predictor")
if args.enable_jump_ahead_predictor:
print("jump ahead predictor not supported for this branch predictor")
assert(False)
bpClass = ObjectList.bp_list.get('DecoupledBPUWithBTB')
if isinstance(test_sys.cpu[i].branchPred, bpClass):
test_sys.cpu[i].branchPred = bpClass(
bpDBSwitches=bp_db_switches,
enableLoopBuffer=args.enable_loop_buffer,
enableLoopPredictor=args.enable_loop_predictor,
enableJumpAheadPredictor=args.enable_jump_ahead_predictor
)
else:
if enable_bp_db:
print("bpdb not supported for this branch predictor")
if args.enable_loop_buffer:
print("loop buffer not supported for this branch predictor")
if args.enable_loop_predictor:
print("loop predictor not supported for this branch predictor")
if args.enable_jump_ahead_predictor:
print("jump ahead predictor not supported for this branch predictor")
# assert(False)
test_sys.cpu[i].createThreads()
print("Create threads for test sys cpu ({})".format(type(test_sys.cpu[i])))

Expand Down
40 changes: 29 additions & 11 deletions configs/example/xiangshan.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,15 +113,25 @@ def build_test_system(np, args):
args.enable_loop_buffer = True

for i in range(np):
if args.bp_type is None or args.bp_type == 'DecoupledBPUWithFTB':
if args.bp_type is None or args.bp_type == 'DecoupledBPUWithFTB' or args.bp_type == 'DecoupledBPUWithBTB':
enable_bp_db = len(args.enable_bp_db) > 1
if enable_bp_db:
bp_db_switches = args.enable_bp_db[1] + ['basic']
print("BP db switches:", bp_db_switches)
else:
bp_db_switches = []

test_sys.cpu[i].branchPred = DecoupledBPUWithFTB(
# for DecoupledBPUWithBTB, loop predictor and jump ahead predictor are not supported
if args.bp_type == 'DecoupledBPUWithBTB':
if args.enable_loop_predictor or args.enable_loop_buffer:
print("loop predictor and loop buffer not supported for DecoupledBPUWithBTB")
args.enable_loop_predictor = False
args.enable_loop_buffer = False
if args.enable_jump_ahead_predictor:
print("jump ahead predictor not supported for DecoupledBPUWithBTB")
args.enable_jump_ahead_predictor = False

BPClass = DecoupledBPUWithBTB() if args.bp_type == 'DecoupledBPUWithBTB' else DecoupledBPUWithFTB()
test_sys.cpu[i].branchPred = BPClass(
bpDBSwitches=bp_db_switches,
enableLoopBuffer=args.enable_loop_buffer,
enableLoopPredictor=args.enable_loop_predictor,
Expand Down Expand Up @@ -345,18 +355,26 @@ def setKmhV3IdealParams(args, system):
# use centralized load/store issue queue, for hmmer

# ideal decoupled frontend
if args.bp_type is None or args.bp_type == 'DecoupledBPUWithFTB':
cpu.branchPred.enableTwoTaken = True
cpu.branchPred.numBr = 8 # numBr must be a power of 2, see getShuffledBrIndex()
cpu.branchPred.predictWidth = 64
if args.bp_type == 'DecoupledBPUWithFTB' or args.bp_type == 'DecoupledBPUWithBTB':
if args.bp_type == 'DecoupledBPUWithFTB':
cpu.branchPred.enableTwoTaken = False
cpu.branchPred.numBr = 8 # numBr must be a power of 2, see getShuffledBrIndex()
cpu.branchPred.predictWidth = 64
cpu.branchPred.uftb.numEntries = 1024
cpu.branchPred.btb.numEntries = 16384
cpu.branchPred.tage.baseTableSize = 16384
cpu.branchPred.tage.tableSizes = [2048] * 14
else:
cpu.branchPred.blockSize = 64 # blockSize equals to predictWidth in DecoupledBPUWithFTB
cpu.branchPred.alignToBlockSize = False # TODO: ubtb not aligned, btb aligned 16byte
cpu.branchPred.ubtb.numEntries = 1024
cpu.branchPred.btb.numEntries = 16384
# TODO: BTB TAGE do not bave base table, do not support SC
cpu.branchPred.tage.tableSizes = [4096] * 14 # BTB TAGE may need larger table
cpu.branchPred.tage.enableSC = False # TODO(bug): When numBr changes, enabling SC will trigger an assert
cpu.branchPred.ftq_size = 256
cpu.branchPred.fsq_size = 256
cpu.branchPred.uftb.numEntries = 1024
cpu.branchPred.ftb.numEntries = 16384
cpu.branchPred.tage.numPredictors = 14
cpu.branchPred.tage.baseTableSize = 16384
cpu.branchPred.tage.tableSizes = [2048] * 14
cpu.branchPred.tage.TTagBitSizes = [13] * 14
cpu.branchPred.tage.TTagPcShifts = [1] * 14
cpu.branchPred.tage.histLengths = [4, 7, 12, 16, 21, 29, 38, 51, 68, 90, 120, 160, 283, 499]
Expand Down
2 changes: 1 addition & 1 deletion src/cpu/o3/BaseO3CPU.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ def support_take_over(cls):
smtROBThreshold = Param.Int(100, "SMT ROB Threshold Sharing Parameter")
smtCommitPolicy = Param.CommitPolicy('RoundRobin', "SMT Commit Policy")

branchPred = Param.BranchPredictor(DecoupledBPUWithFTB(),
branchPred = Param.BranchPredictor(DecoupledBPUWithBTB(),
"Branch Predictor")
needsTSO = Param.Bool(False, "Enable TSO Memory model")

Expand Down
21 changes: 20 additions & 1 deletion src/cpu/o3/commit.cc
Original file line number Diff line number Diff line change
Expand Up @@ -952,6 +952,8 @@ Commit::commit()
// Squashed sequence number must be older than youngest valid
// instruction in the ROB. This prevents squashes from younger
// instructions overriding squashes from older instructions.
DPRINTF(Commit, "fromIEW->squash %d, commitStatus %d, fromIEW->squashedSeqNum %d, youngestSeqNum %d\n",
fromIEW->squash[tid], commitStatus[tid], fromIEW->squashedSeqNum[tid], youngestSeqNum[tid]);
if (fromIEW->squash[tid] &&
commitStatus[tid] != TrapPending &&
fromIEW->squashedSeqNum[tid] <= youngestSeqNum[tid]) {
Expand Down Expand Up @@ -1202,7 +1204,7 @@ Commit::commitInsts()
auto dbftb = dynamic_cast<branch_prediction::ftb_pred::DecoupledBPUWithFTB*>(bp);
bool miss = head_inst->mispredicted();
if (head_inst->isReturn()) {
DPRINTF(FTBRAS, "commit inst PC %x miss %d real target %x pred target %x\n",
DPRINTF(RAS, "commit inst PC %x miss %d real target %x pred target %x\n",
head_inst->pcState().instAddr(), miss,
head_rv_pc.npc(), *(head_inst->predPC));
}
Expand All @@ -1215,6 +1217,23 @@ Commit::commitInsts()
}
}
dbftb->notifyInstCommit(head_inst);
} else if (bp->isBTB()) {
auto dbbtb = dynamic_cast<branch_prediction::btb_pred::DecoupledBPUWithBTB*>(bp);
bool miss = head_inst->mispredicted();
if (head_inst->isReturn()) {
DPRINTF(RAS, "commit inst PC %x miss %d real target %x pred target %x\n",
head_inst->pcState().instAddr(), miss,
head_rv_pc.npc(), *(head_inst->predPC));
}

// FIXME: ignore mret/sret/uret in correspond with RTL
if (!head_inst->isNonSpeculative() && head_inst->isControl()) {
dbbtb->commitBranch(head_inst, miss);
if (!head_inst->isReturn() && head_inst->isIndirectCtrl() && miss) {
misPredIndirect[head_inst->pcState().instAddr()]++;
}
}
dbbtb->notifyInstCommit(head_inst);
}
if (head_inst->isUpdateVsstatusSd()) {
auto v = cpu->readMiscRegNoEffect(RiscvISA::MiscRegIndex::MISCREG_VIRMODE, tid);
Expand Down
Loading