`static-inline` generates incorrect guards #1442

rachitnigam · 2023-05-08T07:01:34Z

For the following program:

import "primitives/core.futil";

component main() -> () {
    cells {
        idx = std_reg(4);
    }
    wires {
        static group incr<1> {
            idx.in = 4'd10;
            idx.write_en = %1 ? 1'd1;
        }
    }
    control {
        static seq {
            static par {
                incr;
            }
        }
    }
}

The following command:

cargo run -- tests/correctness/static/static-mult-dot-product.futil -p dead-group-removal -p validate -p static-inline -p simplify-static-guards

Generates:

...
component main(@go go: 1, @clk clk: 1, @reset reset: 1) -> (@done done: 1) {
  cells {
    idx = std_reg(4);
  }
  wires {
    static group static_seq<1> {
      idx.write_en = !1'b1 ? 1'd1;
      idx.in = %0 ? 4'd10;
    }
  }

  control {
    static_seq;
  }
}

Note that the guard for idx.write_en is false which means the program will never write a value into idx.

The text was updated successfully, but these errors were encountered:

rachitnigam · 2023-05-08T07:04:51Z

I'm reading the static inlining code and I'm not sure if it makes sense. It doesn't seem to implement the inlining technique described in #1344. Specifically, the approach in that overview is that we should keep track of the current relative time when compiling a control operator and only update the guards once we reach an enable.

The current inlining code seems to transform each control operator separately and then & together the guards generated by each level.

rachitnigam · 2023-05-08T07:08:42Z

Hm, nevermind, I'm wrong. update_timing_assignment seems to be the function that does that but it relies on recursively calling inline_static_control on each of the children of a control program. This means that an assignment in a group nested n levels of seq or par deep, the assignment will be updated n times. This $n^2$ behavior for rewrites is probably not a good idea.

rachitnigam · 2023-05-08T07:15:47Z

The problem seems to be the assignment:

idx.write_en = %1 ? 1'd1

If I remove this, the example works. This was initiated by me trying to rewrite the static-mult-dot-product test but I see that it's already a test but it doesn't define assignments in the way I wrote them here.

rachitnigam · 2023-05-08T07:16:34Z

Also, note that this test doesn't work with Icarus for some reason. Running the following command produces an array of all zeroes which means something is wrong:

fud exec --to jq --through icarus-verilog --through dat -s futil.exec './target/debug/futil' -s futil.flags '-p all -d tdst -d group2invoke' -s verilog.cycle_limit 500 -s verilog.data tests/correctness/static-control/static-mult-dot-product.futil.data tests/correctness/static-control/static-mult-dot-product.futil

rachitnigam · 2023-05-08T07:18:00Z

A further problem is that the inlining process cannot inline the par into the seq which means the resulting code allocates two FSMs when it should only really need one.

calebmkim · 2023-05-08T13:14:42Z

I see. Just so we're on the same page:

As you note, the static-inline pass is inefficient right now. If an assignment is written at $n$ levels of nesting, then it gets rewritten $n$ times (This is also why the guards generated are redundant, and the motivation behind the simplify-static-guards pass). Ideally, we would only need to rewrite the assignment once (when turning the "static island" into a static group). I think this would require having some sort of data structure that maps groups to their appropriate "offset", and then, at the end when we're compiling the static island into a single static group, we should "realize" the offsets just once by rewriting the assignments. I think I can take a shot at simplifying this so that the assignments are only rewritten once.
I think the icarus-verilog issue might stem from Reset ports for seq-mem are ignored in reset-insertion #1354, in which external cells do not have their reset port inserted. I can read the related issues and try to fix this.
About the last comment:

A further problem is that the inlining process cannot inline the par into the seq which means the resulting code allocates two FSMs when it should only really need to allocate one.

I think (you can correct me if I'm wrong) you're referring to the fact that there are two fsms that are generated when we compile your Calyx program (i..e, the one in the original comment) using static-inliner and compile-static. What's happening right now is that static-inline is creating a bunch of unused groups (for the same reason that it is inefficiently rewriting the assignments in nested groups) and is therefore unnecessarily instantiating fsms that we will never use. If we compile this all the way down (with dead-group-removal and dead-cell-removal optimizations) then it will only generate one fsm. Either way, though, there are still inefficiencies in the current static-inline implementation that we should probably fix.

rachitnigam · 2023-05-08T13:19:37Z

So, for the last point, running:

cargo run -- tests/correctness/static-control/static-mult-dot-product.futil -d lower

generates:

import "primitives/core.futil";
import "primitives/pipelined.futil";
component main(@go go: 1, @clk clk: 1, @reset reset: 1) -> (@done done: 1) {
  cells {
    mul = pipelined_mult();
    @external left = std_mem_d1(32, 10, 4);
    @external right = std_mem_d1(32, 10, 4);
    @external out = std_mem_d1(32, 10, 4);
    idx = std_reg(4);
    add = std_add(4);
    sub = std_sub(4);
    lt = std_lt(4);
    lt_10 = std_reg(1);
    ge = std_ge(4);
    ge_4 = std_reg(1);
    @generated cond = std_reg(1);
    @generated cond_wire = std_wire(1);
    @generated cond0 = std_reg(1);
    @generated cond_wire0 = std_wire(1);
    @generated fsm3 = std_reg(4);
    @generated ud3 = undef(1);
    @generated adder3 = std_add(4);
    @generated fsm5 = std_reg(1);
    @generated ud5 = undef(1);
    @generated adder5 = std_add(1);
    @generated signal_reg = std_reg(1);
  }
  wires {
    group early_reset_static_seq<"NODE_ID"=0> {
      ge_4.write_en = fsm3.out == 4'd0 ? 1'd1;
      ge_4.in = fsm3.out == 4'd0 ? 1'd0;
      lt_10.write_en = fsm3.out == 4'd0 ? 1'd1;
      lt_10.in = fsm3.out == 4'd0 ? 1'd1;
      early_reset_static_par[go] = fsm3.out >= 4'd1 & fsm3.out < 4'd15 ? 1'd1;
      adder3.left = fsm3.out;
      adder3.right = 4'd1;
      fsm3.write_en = 1'd1;
      fsm3.in = fsm3.out != 4'd14 ? adder3.out;
      fsm3.in = fsm3.out == 4'd14 ? 4'd0;
      early_reset_static_seq[done] = ud3.out;
    }
    group early_reset_static_par<"NODE_ID"=3> {
      ge_4.write_en = fsm5.out == 1'd0 ? 1'd1;
      ge.right = fsm5.out == 1'd0 ? 4'd4;
      lt_10.write_en = fsm5.out == 1'd0 ? 1'd1;
      lt.right = fsm5.out == 1'd0 ? 4'd10;
      idx.write_en = fsm5.out == 1'd0 ? 1'd1;
      add.right = fsm5.out == 1'd0 ? 4'd1;
      add.left = fsm5.out == 1'd0 ? idx.out;
      ge.left = fsm5.out == 1'd0 ? add.out;
      ge_4.in = fsm5.out == 1'd0 ? ge.out;
      lt.left = fsm5.out == 1'd0 ? add.out;
      lt_10.in = fsm5.out == 1'd0 ? lt.out;
      idx.in = fsm5.out == 1'd0 ? add.out;
      cond.in = fsm5.out == 1'd0 ? lt_10.out;
      cond_wire.in = fsm5.out == 1'd0 ? lt_10.out;
      cond.write_en = fsm5.out == 1'd0 ? 1'd1;
      cond_wire.in = !1'b1 ? cond.out;
      right.addr0 = cond_wire.out & fsm5.out == 1'd0 ? idx.out;
      mul.right = cond_wire.out & fsm5.out == 1'd0 ? right.read_data;
      left.addr0 = cond_wire.out & fsm5.out == 1'd0 ? idx.out;
      mul.left = cond_wire.out & fsm5.out == 1'd0 ? left.read_data;
      cond0.in = fsm5.out == 1'd0 ? ge_4.out;
      cond_wire0.in = fsm5.out == 1'd0 ? ge_4.out;
      cond0.write_en = fsm5.out == 1'd0 ? 1'd1;
      cond_wire0.in = !1'b1 ? cond0.out;
      out.write_en = cond_wire0.out & fsm5.out == 1'd0 ? 1'd1;
      out.write_data = cond_wire0.out & fsm5.out == 1'd0 ? mul.out;
      sub.right = cond_wire0.out & fsm5.out == 1'd0 ? 4'd4;
      sub.left = cond_wire0.out & fsm5.out == 1'd0 ? idx.out;
      out.addr0 = cond_wire0.out & fsm5.out == 1'd0 ? sub.out;
      adder5.left = fsm5.out;
      adder5.right = 1'd1;
      fsm5.write_en = 1'd1;
      fsm5.in = fsm5.out != 1'd0 ? adder5.out;
      fsm5.in = fsm5.out == 1'd0 ? 1'd0;
      early_reset_static_par[done] = ud5.out;
    }
    group wrapper_early_reset_static_seq<"NODE_ID"=0> {
      early_reset_static_seq[go] = 1'd1;
      signal_reg.write_en = fsm3.out == 4'd0 & !signal_reg.out ? 1'd1;
      signal_reg.in = fsm3.out == 4'd0 & !signal_reg.out ? 1'd1;
      wrapper_early_reset_static_seq[done] = fsm3.out == 4'd0 & signal_reg.out ? 1'd1;
    }
    signal_reg.write_en = fsm3.out == 4'd0 & signal_reg.out ? 1'd1;
    signal_reg.in = fsm3.out == 4'd0 & signal_reg.out ? 1'd0;
  }

  control {
    wrapper_early_reset_static_seq;
  }
}

Note that we end up generating two FSMs because the inliner did not merge the static_seq and static_par groups together.

rachitnigam · 2023-05-08T13:23:04Z

Ideally, we would only need to rewrite the assignment once (when turning the "static island" into a static group). I think this would require having some sort of data structure that maps groups to their appropriate "offset", and then, at the end when we're compiling the static island into a single static group, we should "realize" the offsets just once by rewriting the assignments

That's right! That's the difference between a bottom-up vs top-down approach. Currently, the pass goes down to the innermost control operator and compiles it fully before popping back up. I will note that the nice thing is that I don't think the generated code would get any better, especially after running guard simplification. You could consider opening an issue about this and working on it as a part of the QoR/overall compiler improvement push later and not block static proposal being done on this working.

calebmkim · 2023-05-08T13:35:21Z

Note that we end up generating two FSMs because the inliner did not merge the static_seq and static_par groups together.

Ah, I see what you're saying. In my mind, this was actually expected since the static_par is the body of a static repeat. Just to be safe, I decided to keep separate fsms for bodies of static repeats to avoid complicated guard expressions with lots of or's (e.g., a.write_en = %2 | % 5 | % 8). However, there are cases where we probably do not need to do this, such as this case: the body of the static repeat has a latency of one cycle, so we don't really need to worry about complicated guard expressions in the body of the static repeat. The cases where we want to keep the same fsm to coordinate the body of a repeat vs. having a separate fsm may be another heuristic/QoR question.

rachitnigam · 2023-05-09T06:00:50Z

@calebmkim Got it! Looks like there are two separate issues here. One, you've already summarized in #1447. The other one is to do with how guards work in groups which is a bonafide bug and needs to be fixed. Does that make sense?

calebmkim · 2023-05-09T14:11:51Z

The other one is to do with how guards work in groups which is a bonafide bug and needs to be fixed. Does that make sense?

Sorry, I'm not quite clear on what you're referring to here? I think the icarus-verilog giving only zeroes is solved by #1446. The "two separate fsms" problem seems to me more of a QoR type of thing. The idx.write_en = %1 ? 1'd1 is out of bounds static timing, which I think is (from no on) going to be deteced by #1443.

rachitnigam · 2023-05-09T15:06:44Z

Sorry, that PR does not fix that problem. We should have a higher-level error before we hit lowering: #1444

rachitnigam · 2023-05-11T15:53:10Z

@calebmkim can you check if this does no longer generates the false guard and close this issue

rachitnigam added the Type: Bug Bug in the implementation label May 8, 2023

rachitnigam added this to the Static Groups milestone May 8, 2023

rachitnigam assigned calebmkim May 8, 2023

rachitnigam closed this as completed May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`static-inline` generates incorrect guards #1442

`static-inline` generates incorrect guards #1442

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

calebmkim commented May 8, 2023

rachitnigam commented May 8, 2023 •

edited

Loading

rachitnigam commented May 8, 2023 •

edited

Loading

calebmkim commented May 8, 2023

rachitnigam commented May 9, 2023

calebmkim commented May 9, 2023

rachitnigam commented May 9, 2023

rachitnigam commented May 11, 2023

static-inline generates incorrect guards #1442

static-inline generates incorrect guards #1442

Comments

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

rachitnigam commented May 8, 2023

calebmkim commented May 8, 2023

rachitnigam commented May 8, 2023 • edited Loading

rachitnigam commented May 8, 2023 • edited Loading

calebmkim commented May 8, 2023

rachitnigam commented May 9, 2023

calebmkim commented May 9, 2023

rachitnigam commented May 9, 2023

rachitnigam commented May 11, 2023

`static-inline` generates incorrect guards #1442

`static-inline` generates incorrect guards #1442

rachitnigam commented May 8, 2023 •

edited

Loading

rachitnigam commented May 8, 2023 •

edited

Loading