Skip to content

Incorrect CSR State Override Logic Causes Spike State Rollback on RMW Instructions #268

@hjw-arch

Description

@hjw-arch

Chipyard Version and Hash

Hash: ucb-bar/chipyard@44fec76

OS Setup

Linux 6.6.114.1-microsoft-standard-WSL2 SMP PREEMPT_DYNAMIC Mon Dec 1 20:46:23 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Other Setup

Using MediumBoomV4CosimConfig

Current Behavior

Description

We suspect that the Cospike harness (cospike_impl.cc) might be incorrectly using the wdata from the DUT's commit log to force-update Spike's CSR state. It appears that for Read-Modify-Write (RMW) instructions like csrrc, wdata may reflect the old value of the CSR (pre-modification). If this is the case, the harness could be effectively "rolling back" Spike's correctly updated state to the old value, which would explain the simulation divergence.

Conditions

  • Chipyard + BOOM (MediumBoomV4CosimConfig), Cospike co-simulation enabled.
  • mstatus.FS initially set to Dirty (FP enabled).
  • Execute: csrrc gp, mstatus, mask (clears mstatus.FS bits to Off) → fcvt.lu.s (FP instruction).

Observed

  • DUT (BOOM): Traps on fcvt.lu.s (Cause 2: Illegal Instruction). ✓
  • Spike (in Co-sim): Does NOT trap on fcvt.lu.s, continues execution.✗(We observed that the Spike log in Cosim indeed shows that, upon executing the instruction to clear the FS bit, mstatus was updated to the new value. )
  • Result: PC mismatch reported.

Additional Observation (with csrr inserted)

After inserting the csrr x30, mstatus instruction, we found that Spike returned the old value (i.e., the value with the FS bit still set, which contradicts the earlier Spike log), whereas the DUT returned the new value. However, in this run, both Spike and the DUT triggered an exception at the fcvt instruction, allowing the simulation to pass. If our hypothesis is correct, this behavior may be due to the Cospike harness’s synchronization mechanism: after reading mstatus, it forcibly synchronized the DUT’s mstatus back to Spike, thereby enabling Spike to correctly raise an exception at the subsequent fcvt instruction.

Expected Behavior

After csrrc clears mstatus.FS, both DUT and Spike should trap on fcvt.lu.s. Co-simulation should pass.

Other Information

Faulty Logic

File: testchipip/src/main/resources/testchipip/csrc/cospike_impl.cc

The harness attempts to synchronize Spike with DUT's wdata without distinguishing instruction types:

// Problematic Override Logic
// s->XPR.write(rd, wdata);
// ...
uint64_t read_bits = s->csrmap[csr_addr]->read();
// Force Spike to match wdata (which is OLD value for RMW)
uint64_t write_bits = (read_bits & ~ignore_bits) | (wdata & ignore_bits);
s->csrmap[csr_addr]->write(write_bits); 

Experiments

We performed three experiments to isolate the issue.

Experiment 1: Baseline Reproduction (Original Bug)

Setup: mstatus.FS is Dirty. Execute csrrc (clearing FS) followed by fcvt (FP instruction).

li s11, 0x000000000000f000
csrrc gp, mstatus, s11
fcvt.lu.s s3, fa7, dyn

Observation:

Spike: core   0: 3 0x0000000080002004 (0x0000fdb7) lui	s11,0xf              x27 0x000000000000f000
DUT:   Cosim: 1583 commit: 80002004 (0xfdb7) lui     s11, 0xf     x27 0xf000(DUT)

Spike: core   0: 3 0x0000000080002008 (0x300db1f3) csrrc	gp,mstatus,s11
                                            x3  0x8000000a000c6088 c768_mstatus 0x0000000a000c0088             <-    NEW MSTATUS
DUT:   Cosim: 1591 commit: 80002008 (0x300db1f3) csrrc   gp, mstatus, s11     x3 0x8000000a000c6088(DUT)

Cosim: CSR read 300
Cosim: CSR status override check
Cosim: 1599 exception 2
Spike: core   0: 3 0x000000008000200c (0xc038f9d3) fcvt.lu.s	s3,fa7
                                            c1_fflags 0x0000000000000010 x19 0xffffffffffffffff
Spike: core   0: 3 0x0000000080002010 (0xffffff17) auipc	t5,0xfffff
                                            x30 0x0000000080001010
DUT:    Cosim: 1661 commit: 80001000 (0x341020f3) csrr    ra, mepc     x30 0x8000200c(DUT)

Cosim: 67d PC mismatch spike 80002010 != DUT 80001000  <-- [Spike=80002010 vs DUT=80001000]
  • DUT (BOOM): Executes csrrc. FS becomes Off. fcvt correctly triggers Trap (Cause 2).
  • Trace: DUT reports wdata(gp) = Old Value (Dirty).
  • Spike: Executes csrrc -> FS becomes Off. Harness Overrides mstatus with wdata (Dirty). Spike FS reverts to Dirty.
  • Result: Spike executes fcvt (no trap), causing a PC mismatch with DUT.
Experiment 2: Diagnostic Check (csrr Read)

Setup: Insert csrr rd, mstatus immediately after csrrc, before fcvt.

li s11, 0x000000000000f000
csrrc gp, mstatus, s11
csrr x30, mstatus
fcvt.lu.s s3, fa7, dyn

Observation:

Spike: core   0: 3 0x0000000080002004 (0x0000fdb7) lui	s11,0xf       x27 0x000000000000f000
DUT:   Cosim: 1583 commit: 80002004 (0xfdb7) lui     s11, 0xf                 x27 0xf000(DUT)

core   0: 3 0x0000000080002008 (0x300db1f3) csrrc	gp,mstatus,s11
                                            x3  0x8000000a000c6088 c768_mstatus 0x0000000a000c0088             <- NEW MSTATUS
Cosim: 1591 commit: 80002008 (0x300db1f3) csrrc   gp, mstatus, s11     x3 0x8000000a000c6088(DUT)

Cosim: CSR read 300
Cosim: CSR status override check
core   0: 3 0x000000008000200c (0x30002f73) csrr	t5,mstatus    x30 0x8000000a000c6088    <- OLD MSTATUS
Cosim: 1605 commit: 8000200c (0x30002f73) csrr    t5, mstatus     x30 0xa000c0088(DUT)                <- NEW MSTATUS

Cosim: CSR read 300
Cosim: CSR status override check
Cosim: 1613 exception 2
core   0: 3 0x0000000080001000 (0x341020f3) csrr	ra,mepc      x1  0x0000000080002010       <- Trap on FP instruction
Cosim: 1673 commit: 80001000 (0x341020f3) csrr    ra, mepc     x1 0x80002010(DUT)                <- Trap on FP instruction
  • Spike: csrr returns Dirty (Old Value). This may confirms the harness rollback.
  • DUT (BOOM): csrr returns OFF (New Value) in the log.
  • Note: Due to the overrides mechanism, after executing csrr x30, mstatus, Spike’s mstatus.FS field was overwritten again by the DUT’s mstatus.FS.
  • Result: Both simulators trap on FP instruction
Experiment 3: Fix Verification (Remove Override)

Setup: Comment out the s->csrmap[csr_addr]->write(...) logic in cospike_impl.cc.
Observation:

Spike: core   0: 3 0x0000000080002004 (0x0000fdb7) lui	s11,0xf       x27 0x000000000000f000
DUT:   Cosim: 1583 commit: 80002004 (0xfdb7) lui     s11, 0xf     x27 0xf000(DUT)

Spike: core   0: 3 0x0000000080002008 (0x300db1f3) csrrc	gp,mstatus,s11
                                            x3  0x8000000a000c6088 c768_mstatus 0x0000000a000c0088
DUT:   Cosim: 1591 commit: 80002008 (0x300db1f3) csrrc   gp, mstatus, s11     x3 0x8000000a000c6088(DUT)

Cosim: CSR read 300
Cosim: CSR status override check
Cosim: 1599 exception 2
core   0: 3 0x0000000080001000 (0x341020f3) csrr	ra,mepc     x1  0x000000008000200c      <- Trap on FP instruction
Cosim: 1661 commit: 80001000 (0x341020f3) csrr    ra, mepc     x1 0x8000200c(DUT)               <- Trap on FP instruction
  • Spike: Executes csrrc. FS remains Off (Correct).
  • Result: Both Spike and DUT trap on fcvt. csrr reads correct (New) value. Cosimulation passes perfectly.

Attachments

ELF binary and full execution logs are attached.

cospike_nofix.zip

cospike_fix.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions