Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NTT] Parameter to reduce fan-out. #371

Closed
cgyurgyik opened this issue Jan 27, 2021 · 4 comments · Fixed by #462
Closed

[NTT] Parameter to reduce fan-out. #371

cgyurgyik opened this issue Jan 27, 2021 · 4 comments · Fixed by #462

Comments

@cgyurgyik
Copy link
Collaborator

Discussed yesterday was the issue of too much parallelism in the NTT pipeline, which leads to a high fan-out. Ideally, we should be able to manipulate this with some parameter in the pipeline generation.

A starting place is to take a look at Shunning's pymtl3-fft, and see how he approached this.

@cgyurgyik
Copy link
Collaborator Author

cgyurgyik commented Feb 1, 2021

So here is what runs in parallel for a NTT pipeline of input size N:

  • precursors: N register writes
  • mults: N/2 smult_pipes
  • op_mods N {sadd, ssub} +smod_pipes

A simple way to reduce the amount of parallelism is only touching the control.

We'll call this parameter par_red (short for parallel reduction), which must be a power of 2 and less than or equal to N. This will simply take a parallel control of N groups and turn it into N / par_red sequential groups. This keeps in mind that mults is already a factor of 2 less than the other groups.

Example, N = 4

Current (already sequential groups not shown):

control {
    seq {
      ...
      par { s0_mul0; s0_mul1; }
      par { s0_r0_op_mod; s0_r1_op_mod; s0_r2_op_mod; s0_r3_op_mod; }
      par { precursor_0; precursor_1; precursor_2; precursor_3; }
      par { s1_mul0; s1_mul1; }
      par { s1_r0_op_mod; s1_r1_op_mod; s1_r2_op_mod; s1_r3_op_mod; }
      ...
    }
  }

par_red = 2

control {
    seq {
      ...
      par { s0_mul0; s0_mul1; }

      par { s0_r0_op_mod; s0_r1_op_mod; }
      par { s0_r2_op_mod; s0_r3_op_mod; }

      par { precursor_0; precursor_1; }
      par { precursor_2; precursor_3; }

      par { s1_mul0; s1_mul1; }

      par { s1_r0_op_mod; s1_r1_op_mod; }
      par { s1_r2_op_mod; s1_r3_op_mod; }
      ...
    }
  }

Drawbacks

This is a very simple approach, and considers all groups as equal, which isn't the case. A register write group's critical path is going to be much shorter than that of one with a smult_pipe in between, for example. However, it does reduce fan-out.

@cgyurgyik
Copy link
Collaborator Author

#382 Discusses an approach that is more generalized and better for reducing fan-out.

@sgpthomas
Copy link
Collaborator

@cgyurgyik close the issue if you think #382 makes this issue irrelevant. FWIW I'm not sure it does. It still seems like it would be nice to have a parameter in the generator for this

@cgyurgyik
Copy link
Collaborator Author

Noted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants