Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
232 changes: 232 additions & 0 deletions examples/shrike_serv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
# shrike_serv

**Difficulty:** Advanced
**Uses MCU:** Yes
**External Hardware:** None

World's first documented port of a RISC-V soft CPU to the Renesas SLG47910
ForgeFPGA. Runs [SERV](https://github.com/olofk/serv) entirely on FPGA fabric.

---

## Expected Output

```
Flashing SERV bitstream to FPGA...
[shrike_flash] FPGA programming done.
SERV RISC-V computed: 1 + 2 = 3
```

![Serial output](images/output.JPG)

---

## Compatibility

| Board | MCU | Status |
|---|---|---|
| Shrike-lite | RP2040 | Tested and working |
| Shrike | RP2350 | Untested |
| Shrike-fi | ESP32-S3 | Untested |

---

## Resource Utilisation

| Resource | Used | Available | % |
|---|---|---|---|
| CLB LUT5s | 516 | 1120 | 46 |
| FFs | 230 | 1120 | 21 |
| CLBs | 109 | 140 | 78 |
| GPIOs | 6 | 19 | 32 |

---

## Directory Structure

```
shrike_serv/
├── README.md
├── shrike_serv.ffpga # Go Configure project file
├── ffpga/
│ └── src/
│ ├── serv_shrike_top.v # Top-level wrapper
│ ├── nuclear_rom.v # Hardcoded instruction ROM
│ └── serv_rf_ram_shrike.v # FF register file
├── images/
│ └── output.JPG
├── firmware/
│ └── micropython/
│ └── shrike_serv.py
└── bitstream/
└── shrike_serv.bin # Pre-built bitstream
```

---

## Setup

### Step 1 — Get SERV RTL files

```bash
git clone https://github.com/olofk/serv
cp serv/rtl/*.v ffpga/src/
```

> Do **not** copy `serv_rf_ram.v` — use `serv_rf_ram_shrike.v` instead.

### Step 2 — Edit serv_rf_top.v

```verilog
// Find this line in serv_rf_top.v and change:
serv_rf_ram #(.DEPTH(32), .RF_W(RF_W)) rf_ram (...);
// To:
serv_rf_ram_shrike #(.DEPTH(16), .RF_W(RF_W)) rf_ram (...);
```

### Step 3 — Open in Go Configure

Open `shrike_serv.ffpga` directly in Go Configure Software Hub.

If rebuilding from scratch, add files in this order:
```
ffpga/src/serv_state.v
ffpga/src/serv_decode.v
ffpga/src/serv_immdec.v
ffpga/src/serv_bufreg.v
ffpga/src/serv_bufreg2.v
ffpga/src/serv_alu.v
ffpga/src/serv_mem_if.v
ffpga/src/serv_csr.v
ffpga/src/serv_ctrl.v
ffpga/src/serv_rf_if.v
ffpga/src/serv_rf_ram_if.v
ffpga/src/serv_rf_ram_shrike.v
ffpga/src/serv_rf_top.v
ffpga/src/serv_top.v
ffpga/src/nuclear_rom.v
ffpga/src/serv_shrike_top.v
```

**IO Planner — assign ONLY:**

| Signal | Resource |
|---|---|
| `clk` | `OSC_CLK` |
| `clk_en` | `OSC_EN` |

Leave all `result_bit*` signals unassigned — see toolchain note 3 below.

Click **Synthesize** then **Generate Bitstream**.

### Step 4 — Flash and run

Copy `bitstream/shrike_serv.bin` to the board via Thonny file panel, then run
`firmware/micropython/shrike_serv.py`.

---

## How to Change the Computation

To compute something other than `1 + 2 = 3`, edit `ffpga/src/nuclear_rom.v`.

### Understanding the instruction encoding

Each line in the `case()` block is one 32-bit RISC-V instruction. The hex
values encode standard RV32I instructions. Use any RISC-V assembler or the
table below to get the hex for your instruction.

### Example — compute 4 + 5 = 9

Open `ffpga/src/nuclear_rom.v` and change the program:

```verilog
always @(*) begin
case (i_adr[4:2])
3'd0 : o_dat = 32'h00400093; // addi x1, x0, 4 → x1 = 4
3'd1 : o_dat = 32'h00500113; // addi x2, x0, 5 → x2 = 5
3'd2 : o_dat = 32'h002081B3; // add x3, x1, x2 → x3 = 9
3'd3 : o_dat = 32'h40000237; // lui x4, 0x40000 → x4 = GPIO base
3'd4 : o_dat = 32'h00322023; // sw x3, 0(x4) → output result
3'd5 : o_dat = 32'h0000006F; // jal x0, 0 → halt
default : o_dat = 32'h00000013; // nop
endcase
end
```

### Encoding your own `addi` instruction

The `addi x1, x0, N` instruction puts the value `N` into register `x1`.

```
Hex format: 0xNNN00093
^^^ = immediate value N in hex (12-bit, max 2047)
^^ = source register x0 = 00
^ = destination register x1 = 1
^^^ = opcode for addi
```

Quick reference:

| Value | addi x1 instruction | addi x2 instruction |
|---|---|---|
| 1 | `32'h00100093` | `32'h00100113` |
| 2 | `32'h00200093` | `32'h00200113` |
| 3 | `32'h00300093` | `32'h00300113` |
| 4 | `32'h00400093` | `32'h00400113` |
| 5 | `32'h00500093` | `32'h00500113` |
| 10 | `32'h00A00093` | `32'h00A00113` |
| 20 | `32'h01400093` | `32'h01400113` |

After editing, re-synthesise and re-generate the bitstream in Go Configure,
then copy the new `FPGA_bitstream_MCU.bin` to the board as `shrike_serv.bin`.

### Result output range

The current design outputs `dbus_dat[1:0]` — 2 bits — so the readable result
range is 0–3. For larger results, modify `serv_shrike_top.v` to output more
bits (add `result_bit2`, `result_bit3`, etc.) and update the IO Planner and
`firmware/micropython/shrike_serv.py` to read the extra pins.

---

## Toolchain Notes (Novel Findings for SLG47910)

### 1. BRAM initialisation crash

`$readmemh` → Yosys falls back to RAMSRL → ~820 LUTs consumed → compiler aborts.

**Fix:** `case()` combinational block in `nuclear_rom.v` — pure LUT logic,
no BRAM, no RAMSRL.

### 2. Silent register file routing failure

`serv_rf_ram.v` uses a `reg` array → Yosys infers RAMSRL → Forge PNR silently
fails to route → CPU frozen at `PC=0x00`, no error reported, bitstream appears
to generate successfully.

**Fix:** `(* ram_style = "registers" *)` in `serv_rf_ram_shrike.v` → plain DFFs.

### 3. IO Planner explicit GPIO17/18 assignment breaks output

Manually assigning `result_bit*` signals in IO Planner conflicts with Yosys
auto-routing. FPGA GPIO17/18 are the only pins hardwired to RP2040 GPIO14/15
via PCB 0-ohm resistors. Auto-routing correctly places signals there.

**Fix:** Assign ONLY `clk → OSC_CLK` and `clk_en → OSC_EN`. Leave result
signals unassigned.

---

## References

- [SERV](https://github.com/olofk/serv) by Olof Kindgren (ISC licence)
- [SLG47910 Datasheet](https://www.renesas.com/en/products/slg47910)
- [Shrike documentation](https://vicharak-in.github.io/shrike/)
- [Go Configure Software Hub](https://www.renesas.com/en/software-tool/go-configure-software-hub)

---

## Licence

GPL-2.0. SERV RTL files retain their original ISC licence headers.
Binary file added examples/shrike_serv/bitstream/shrike_serv.bin
Binary file not shown.
59 changes: 59 additions & 0 deletions examples/shrike_serv/ffpga/src/nuclear_rom.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
// =============================================================================
// nuclear_rom.v
// Board : Shrike-lite (SLG47910 Forge FPGA)
// License : GPL-2.0
//
// Zero-wait-state combinational instruction ROM for SERV.
//
// WHY case() INSTEAD OF $readmemh
// The Forge FPGA toolchain cannot initialise BRAM from a hex file.
// Attempting to use $readmemh causes Yosys to fall back to RAMSRL primitives
// (shift-register LUT RAM). A 128-byte ROM implemented in RAMSRL consumes
// 800+ CLB LUTs and crashes the compiler with a resource overflow.
//
// A case() block is pure combinational logic — Yosys maps it to a small
// LUT mux tree. No BRAM, no RAMSRL, no crash.
// See docs/toolchain_failures.md for the full analysis.
//
// PROGRAM (RV32I, no CSR, no interrupts)
//
// word address hex assembly
// 0 0x00 00100093 addi x1, x0, 1
// 1 0x04 00200113 addi x2, x0, 2
// 2 0x08 002081B3 add x3, x1, x2 ; x3 = 3
// 3 0x0C 40000237 lui x4, 0x40000 ; x4 = 0x40000000
// 4 0x10 00322023 sw x3, 0(x4) ; write result to GPIO
// 5 0x14 0000006F jal x0, 0 ; halt
//
// INTERFACE (SERV ibus — Wishbone-compatible, read-only)
// i_adr [31:0] byte address from SERV program counter
// i_cyc bus cycle valid from SERV
// o_dat [31:0] instruction word returned (combinational, same cycle)
// o_ack acknowledge (tied to i_cyc — zero wait states)
// =============================================================================

module nuclear_rom (
input wire [31:0] i_adr,
input wire i_cyc,
output reg [31:0] o_dat,
output wire o_ack
);

// Acknowledge in the same cycle — SERV never stalls on instruction fetch.
assign o_ack = i_cyc;

// Word-addressed decode: i_adr[4:2] gives word index 0–5.
// i_adr[1:0] is always 0 for aligned 32-bit fetches (SERV guarantee).
always @(*) begin
case (i_adr[4:2])
3'd0 : o_dat = 32'h00100093; // addi x1, x0, 1
3'd1 : o_dat = 32'h00200113; // addi x2, x0, 2
3'd2 : o_dat = 32'h002081B3; // add x3, x1, x2
3'd3 : o_dat = 32'h40000237; // lui x4, 0x40000
3'd4 : o_dat = 32'h00322023; // sw x3, 0(x4)
3'd5 : o_dat = 32'h0000006F; // jal x0, 0 (halt)
default : o_dat = 32'h00000013; // nop (addi x0, x0, 0)
endcase
end

endmodule
73 changes: 73 additions & 0 deletions examples/shrike_serv/ffpga/src/serv_rf_ram_shrike.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
// =============================================================================
// serv_rf_ram_shrike.v
// Board : Shrike-lite (SLG47910 Forge FPGA)
// License : GPL-2.0
//
// Drop-in replacement for serv_rf_ram.v — FF-based register file.
//
// WHY THIS FILE EXISTS
// serv_rf_ram.v uses a Verilog reg array which Yosys infers as a $mem cell
// and maps to RAMSRL (shift-register LUT RAM) primitives. The Forge PNR
// accepts the RAMSRL cells in the netlist but silently fails to route them
// inside serv_rf_top. The register file outputs are unconnected. The CPU
// can fetch instructions but every register read returns 0, leaving SERV
// frozen at PC=0x00000000 indefinitely. No error is reported.
//
// THE FIX
// (* ram_style = "registers" *) forces Yosys to implement the memory as
// individual CLB flip-flops instead of RAMSRL cells. The Forge PNR routes
// standard DFFs without any issues.
//
// REGISTER ALIASING
// DEPTH=16 stores x0–x15 physically. x16–x31 alias to x0–x15 (the MSB of
// any 5-bit register address is discarded). This is safe for any program
// that only uses x0–x15 — which includes the 1+2=3 program in nuclear_rom.v.
//
// FF BUDGET
// DEPTH=16, RF_W=2:
// 16 × (32÷2) = 256 entries × 2 bits = 512 FFs (register file)
// SERV CPU core state machine ≈ 164 FFs
// Reset counter + GPIO latch ≈ 20 FFs
// ───────────────────────────────────────────────
// Total ≈ 696 FFs (≤ 1120 available)
//
// USAGE
// In serv_rf_top.v, find the serv_rf_ram instantiation and change:
// serv_rf_ram #(.DEPTH(32), .RF_W(RF_W)) →
// serv_rf_ram_shrike #(.DEPTH(16), .RF_W(RF_W))
// Port names are identical — this is a true drop-in replacement.
// Do NOT include serv_rf_ram.v in the Go Configure project.
// =============================================================================

module serv_rf_ram_shrike
#(parameter DEPTH = 16,
parameter RF_W = 2)
(
input wire i_clk,

input wire [RF_W-1:0] i_wdata,
input wire [$clog2(DEPTH*32/RF_W)-1:0] i_waddr,
input wire i_wen,

output wire [RF_W-1:0] o_rdata0,
input wire [$clog2(DEPTH*32/RF_W)-1:0] i_raddr0,
input wire i_ren0,

output wire [RF_W-1:0] o_rdata1,
input wire [$clog2(DEPTH*32/RF_W)-1:0] i_raddr1,
input wire i_ren1
);

localparam MEM_DEPTH = DEPTH * 32 / RF_W;

// Critical attribute: forces individual flip-flops, avoids RAMSRL.
(* ram_style = "registers" *) reg [RF_W-1:0] mem [0:MEM_DEPTH-1];

always @(posedge i_clk) begin
if (i_wen) mem[i_waddr] <= i_wdata;
end

assign o_rdata0 = mem[i_raddr0];
assign o_rdata1 = mem[i_raddr1];

endmodule
Loading