This project implements a 5-stage pipelined scalar CPU based on a classic RISC architecture. The pipeline is structured into the following stages:
- IF (Instruction Fetch)
- ID (Instruction Decode & Register Fetch)
- EX (Execute / ALU operations)
- MEM (Memory Access)
- WB (Write Back)
The goal is to achieve higher instruction throughput by overlapping the execution of instructions while managing hazards and pipeline control mechanisms effectively.
- Fetches the next instruction from instruction memory.
- Updates the Program Counter (PC).
- Can be stalled by control hazards (e.g., unresolved branches).
- Decodes the instruction.
- Reads operands from the register file.
- Generates control signals for the rest of the pipeline.
- Handles hazards through forwarding or stalling if required.
- Performs ALU operations.
- Calculates memory addresses for load/store.
- Handles branch condition evaluation.
- Selects operands through a forwarding unit to avoid data hazards.
- Performs read/write operations on data memory.
- Interacts with the memory subsystem, which may include stalling if memory is slow or unaligned.
- Writes the result back to the register file.
- Final stage in the instruction lifecycle.
- Stalling occurs when an instruction in the pipeline must wait for a previous one to complete.
- Common stalling cases:
- Data hazards (RAW: Read After Write)
- Load-use hazards (e.g., using a loaded value in the next instruction)
- Structural hazards (resource contention)
- Implemented via control signals that prevent pipeline registers from updating.
- Resolved using:
- Data forwarding (from EX/MEM/WB to earlier stages)
- Stalling when forwarding is not possible (e.g., load-use hazards)
- Caused by branches and jumps.
- Solutions:
- Branch prediction (static or dynamic)
- Flush instructions if the branch is mispredicted
- Delay slots (less common in modern designs)
- Occur when hardware resources are insufficient.
- Avoided by ensuring separate read/write paths or using separate instruction/data memories (Harvard architecture).
- No memory elements, outputs depend solely on current inputs.
- Used for ALUs, decoders, control logic, and address generation.
- Includes memory elements (flip-flops, latches).
- Used for registers, pipeline latches, PC updates, and state machines.
- CPI (Cycles Per Instruction): Ideal CPI = 1, increases due to stalls.
- IPC (Instructions Per Cycle): Targeting IPC ≈ 1 with efficient hazard handling.
- Throughput: Measured in instructions per second (IPS).
- Latency: Number of cycles from instruction fetch to write-back.
- Dynamic branch prediction (e.g., 2-bit predictors)
- Out-of-order execution
- Superscalar extensions
- Hazard visualizer and simulation tools
- Hennessy, John L., and David A. Patterson. Computer Architecture: A Quantitative Approach.
- Patterson, David A., and John L. Hennessy. Computer Organization and Design.
- MIT 6.004 - Computation Structures
- RISC-V Specifications
The project structure heavily borrows the AWS EC2 FPGA HDK structure, see here.