Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static-inline generates incorrect guards #1442

Closed
rachitnigam opened this issue May 8, 2023 · 13 comments
Closed

static-inline generates incorrect guards #1442

rachitnigam opened this issue May 8, 2023 · 13 comments
Assignees
Labels
Type: Bug Bug in the implementation
Milestone

Comments

@rachitnigam
Copy link
Contributor

For the following program:

import "primitives/core.futil";

component main() -> () {
    cells {
        idx = std_reg(4);
    }
    wires {
        static group incr<1> {
            idx.in = 4'd10;
            idx.write_en = %1 ? 1'd1;
        }
    }
    control {
        static seq {
            static par {
                incr;
            }
        }
    }
}

The following command:

cargo run -- tests/correctness/static/static-mult-dot-product.futil -p dead-group-removal -p validate -p static-inline -p simplify-static-guards

Generates:

...
component main(@go go: 1, @clk clk: 1, @reset reset: 1) -> (@done done: 1) {
  cells {
    idx = std_reg(4);
  }
  wires {
    static group static_seq<1> {
      idx.write_en = !1'b1 ? 1'd1;
      idx.in = %0 ? 4'd10;
    }
  }

  control {
    static_seq;
  }
}

Note that the guard for idx.write_en is false which means the program will never write a value into idx.

@rachitnigam rachitnigam added the Type: Bug Bug in the implementation label May 8, 2023
@rachitnigam rachitnigam added this to the Static Groups milestone May 8, 2023
@rachitnigam
Copy link
Contributor Author

I'm reading the static inlining code and I'm not sure if it makes sense. It doesn't seem to implement the inlining technique described in #1344. Specifically, the approach in that overview is that we should keep track of the current relative time when compiling a control operator and only update the guards once we reach an enable.

The current inlining code seems to transform each control operator separately and then & together the guards generated by each level.

@rachitnigam
Copy link
Contributor Author

Hm, nevermind, I'm wrong. update_timing_assignment seems to be the function that does that but it relies on recursively calling inline_static_control on each of the children of a control program. This means that an assignment in a group nested n levels of seq or par deep, the assignment will be updated n times. This $n^2$ behavior for rewrites is probably not a good idea.

@rachitnigam
Copy link
Contributor Author

The problem seems to be the assignment:

idx.write_en = %1 ? 1'd1

If I remove this, the example works. This was initiated by me trying to rewrite the static-mult-dot-product test but I see that it's already a test but it doesn't define assignments in the way I wrote them here.

@rachitnigam
Copy link
Contributor Author

Also, note that this test doesn't work with Icarus for some reason. Running the following command produces an array of all zeroes which means something is wrong:

fud exec --to jq --through icarus-verilog --through dat -s futil.exec './target/debug/futil' -s futil.flags '-p all -d tdst -d group2invoke' -s verilog.cycle_limit 500 -s verilog.data tests/correctness/static-control/static-mult-dot-product.futil.data tests/correctness/static-control/static-mult-dot-product.futil

@rachitnigam
Copy link
Contributor Author

A further problem is that the inlining process cannot inline the par into the seq which means the resulting code allocates two FSMs when it should only really need one.

@calebmkim
Copy link
Contributor

I see. Just so we're on the same page:

  • As you note, the static-inline pass is inefficient right now. If an assignment is written at $n$ levels of nesting, then it gets rewritten $n$ times (This is also why the guards generated are redundant, and the motivation behind the simplify-static-guards pass). Ideally, we would only need to rewrite the assignment once (when turning the "static island" into a static group). I think this would require having some sort of data structure that maps groups to their appropriate "offset", and then, at the end when we're compiling the static island into a single static group, we should "realize" the offsets just once by rewriting the assignments. I think I can take a shot at simplifying this so that the assignments are only rewritten once.
  • I think the icarus-verilog issue might stem from Reset ports for seq-mem are ignored in reset-insertion #1354, in which external cells do not have their reset port inserted. I can read the related issues and try to fix this.
  • About the last comment:

A further problem is that the inlining process cannot inline the par into the seq which means the resulting code allocates two FSMs when it should only really need to allocate one.

  • I think (you can correct me if I'm wrong) you're referring to the fact that there are two fsms that are generated when we compile your Calyx program (i..e, the one in the original comment) using static-inliner and compile-static. What's happening right now is that static-inline is creating a bunch of unused groups (for the same reason that it is inefficiently rewriting the assignments in nested groups) and is therefore unnecessarily instantiating fsms that we will never use. If we compile this all the way down (with dead-group-removal and dead-cell-removal optimizations) then it will only generate one fsm. Either way, though, there are still inefficiencies in the current static-inline implementation that we should probably fix.

@rachitnigam
Copy link
Contributor Author

rachitnigam commented May 8, 2023

So, for the last point, running:

cargo run -- tests/correctness/static-control/static-mult-dot-product.futil -d lower

generates:

import "primitives/core.futil";
import "primitives/pipelined.futil";
component main(@go go: 1, @clk clk: 1, @reset reset: 1) -> (@done done: 1) {
  cells {
    mul = pipelined_mult();
    @external left = std_mem_d1(32, 10, 4);
    @external right = std_mem_d1(32, 10, 4);
    @external out = std_mem_d1(32, 10, 4);
    idx = std_reg(4);
    add = std_add(4);
    sub = std_sub(4);
    lt = std_lt(4);
    lt_10 = std_reg(1);
    ge = std_ge(4);
    ge_4 = std_reg(1);
    @generated cond = std_reg(1);
    @generated cond_wire = std_wire(1);
    @generated cond0 = std_reg(1);
    @generated cond_wire0 = std_wire(1);
    @generated fsm3 = std_reg(4);
    @generated ud3 = undef(1);
    @generated adder3 = std_add(4);
    @generated fsm5 = std_reg(1);
    @generated ud5 = undef(1);
    @generated adder5 = std_add(1);
    @generated signal_reg = std_reg(1);
  }
  wires {
    group early_reset_static_seq<"NODE_ID"=0> {
      ge_4.write_en = fsm3.out == 4'd0 ? 1'd1;
      ge_4.in = fsm3.out == 4'd0 ? 1'd0;
      lt_10.write_en = fsm3.out == 4'd0 ? 1'd1;
      lt_10.in = fsm3.out == 4'd0 ? 1'd1;
      early_reset_static_par[go] = fsm3.out >= 4'd1 & fsm3.out < 4'd15 ? 1'd1;
      adder3.left = fsm3.out;
      adder3.right = 4'd1;
      fsm3.write_en = 1'd1;
      fsm3.in = fsm3.out != 4'd14 ? adder3.out;
      fsm3.in = fsm3.out == 4'd14 ? 4'd0;
      early_reset_static_seq[done] = ud3.out;
    }
    group early_reset_static_par<"NODE_ID"=3> {
      ge_4.write_en = fsm5.out == 1'd0 ? 1'd1;
      ge.right = fsm5.out == 1'd0 ? 4'd4;
      lt_10.write_en = fsm5.out == 1'd0 ? 1'd1;
      lt.right = fsm5.out == 1'd0 ? 4'd10;
      idx.write_en = fsm5.out == 1'd0 ? 1'd1;
      add.right = fsm5.out == 1'd0 ? 4'd1;
      add.left = fsm5.out == 1'd0 ? idx.out;
      ge.left = fsm5.out == 1'd0 ? add.out;
      ge_4.in = fsm5.out == 1'd0 ? ge.out;
      lt.left = fsm5.out == 1'd0 ? add.out;
      lt_10.in = fsm5.out == 1'd0 ? lt.out;
      idx.in = fsm5.out == 1'd0 ? add.out;
      cond.in = fsm5.out == 1'd0 ? lt_10.out;
      cond_wire.in = fsm5.out == 1'd0 ? lt_10.out;
      cond.write_en = fsm5.out == 1'd0 ? 1'd1;
      cond_wire.in = !1'b1 ? cond.out;
      right.addr0 = cond_wire.out & fsm5.out == 1'd0 ? idx.out;
      mul.right = cond_wire.out & fsm5.out == 1'd0 ? right.read_data;
      left.addr0 = cond_wire.out & fsm5.out == 1'd0 ? idx.out;
      mul.left = cond_wire.out & fsm5.out == 1'd0 ? left.read_data;
      cond0.in = fsm5.out == 1'd0 ? ge_4.out;
      cond_wire0.in = fsm5.out == 1'd0 ? ge_4.out;
      cond0.write_en = fsm5.out == 1'd0 ? 1'd1;
      cond_wire0.in = !1'b1 ? cond0.out;
      out.write_en = cond_wire0.out & fsm5.out == 1'd0 ? 1'd1;
      out.write_data = cond_wire0.out & fsm5.out == 1'd0 ? mul.out;
      sub.right = cond_wire0.out & fsm5.out == 1'd0 ? 4'd4;
      sub.left = cond_wire0.out & fsm5.out == 1'd0 ? idx.out;
      out.addr0 = cond_wire0.out & fsm5.out == 1'd0 ? sub.out;
      adder5.left = fsm5.out;
      adder5.right = 1'd1;
      fsm5.write_en = 1'd1;
      fsm5.in = fsm5.out != 1'd0 ? adder5.out;
      fsm5.in = fsm5.out == 1'd0 ? 1'd0;
      early_reset_static_par[done] = ud5.out;
    }
    group wrapper_early_reset_static_seq<"NODE_ID"=0> {
      early_reset_static_seq[go] = 1'd1;
      signal_reg.write_en = fsm3.out == 4'd0 & !signal_reg.out ? 1'd1;
      signal_reg.in = fsm3.out == 4'd0 & !signal_reg.out ? 1'd1;
      wrapper_early_reset_static_seq[done] = fsm3.out == 4'd0 & signal_reg.out ? 1'd1;
    }
    signal_reg.write_en = fsm3.out == 4'd0 & signal_reg.out ? 1'd1;
    signal_reg.in = fsm3.out == 4'd0 & signal_reg.out ? 1'd0;
  }

  control {
    wrapper_early_reset_static_seq;
  }
}

Note that we end up generating two FSMs because the inliner did not merge the static_seq and static_par groups together.

@rachitnigam
Copy link
Contributor Author

rachitnigam commented May 8, 2023

Ideally, we would only need to rewrite the assignment once (when turning the "static island" into a static group). I think this would require having some sort of data structure that maps groups to their appropriate "offset", and then, at the end when we're compiling the static island into a single static group, we should "realize" the offsets just once by rewriting the assignments

That's right! That's the difference between a bottom-up vs top-down approach. Currently, the pass goes down to the innermost control operator and compiles it fully before popping back up. I will note that the nice thing is that I don't think the generated code would get any better, especially after running guard simplification. You could consider opening an issue about this and working on it as a part of the QoR/overall compiler improvement push later and not block static proposal being done on this working.

@calebmkim
Copy link
Contributor

Note that we end up generating two FSMs because the inliner did not merge the static_seq and static_par groups together.

Ah, I see what you're saying. In my mind, this was actually expected since the static_par is the body of a static repeat. Just to be safe, I decided to keep separate fsms for bodies of static repeats to avoid complicated guard expressions with lots of or's (e.g., a.write_en = %2 | % 5 | % 8). However, there are cases where we probably do not need to do this, such as this case: the body of the static repeat has a latency of one cycle, so we don't really need to worry about complicated guard expressions in the body of the static repeat. The cases where we want to keep the same fsm to coordinate the body of a repeat vs. having a separate fsm may be another heuristic/QoR question.

@rachitnigam
Copy link
Contributor Author

@calebmkim Got it! Looks like there are two separate issues here. One, you've already summarized in #1447. The other one is to do with how guards work in groups which is a bonafide bug and needs to be fixed. Does that make sense?

@calebmkim
Copy link
Contributor

The other one is to do with how guards work in groups which is a bonafide bug and needs to be fixed. Does that make sense?

Sorry, I'm not quite clear on what you're referring to here? I think the icarus-verilog giving only zeroes is solved by #1446. The "two separate fsms" problem seems to me more of a QoR type of thing. The idx.write_en = %1 ? 1'd1 is out of bounds static timing, which I think is (from no on) going to be deteced by #1443.

@rachitnigam
Copy link
Contributor Author

Sorry, that PR does not fix that problem. We should have a higher-level error before we hit lowering: #1444

@rachitnigam
Copy link
Contributor Author

@calebmkim can you check if this does no longer generates the false guard and close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Bug in the implementation
Projects
None yet
Development

No branches or pull requests

2 participants