[Feature](mluOpExecFFT): add bluestein fft #1213

DanieeelLiu · 2025-02-25T07:38:55Z

Thanks for your contribution and we appreciate it a lot. 🚀🚀

1. Motivation

Please describe your motivation and the goal you want to achieve through this pull request.

2. Modification

Please briefly describe what modification is made in this pull request, and indicate where to make the modification.

Are new test cases added? If so, please post the corresponding generator-PR link here.

3. Test Report

If you want to know how to do operator testing, you can see GTest-User-Guide-zh.

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.

static threshold
- diff1
  - float32 mlu diff1 <= 1e-5
  - float32 mlu diff1 <= 3e-3
  - float16 mlu diff1 <= 3e-3
- diff2
  - float32 mlu diff2 <= 1e-5
  - float32 mlu diff2 <= 3e-3
  - float16 mlu diff2 <= 3e-3
- diff3
  - mlu diff3 == 0
  - mlu diff3_1 == 0
  - mlu diff3_2 == 0
dynamic threshold
- diff1: mlu diff1 <= max(baseline diff1 * 10, static threshold)
- diff2: mlu diff2 <= max(baseline diff2 * 10, static threshold)
- diff3: mlu diff3 <= max(baseline diff3 * 10, static threshold)
  - float32, threshold = 1e-5
  - float16, threshold = 1e-3

3.1.2 Operator Scheme checklist

Supported hardware
- MLU370
- MLU590
Job types
- BLOCK
- UNION1
- UNION2
- UNION4
- The operator will dynamically select the most suitable task type, for example, UNION8

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS™ Performance Acceptance Standard for details.

Platform：MLU370

# The test results should contain Op name, Shape, Data type,  
#   MLU Hardware Time(us), MLU Interface Time(us), MLU IO Efficiency, 
#   MLU Compute Efficiency, and Mlu Workspace Size(Bytes)
# 
# for example:
#
# ----------- case0 -----------
# case0
# [Op name                ]: abs
# [Shape                  ]: input.shape=[1024,1024,3,4], output.shape=[1024,1024,3,4]
# [Data type]             ]: float32
# [MLU Hardware Time      ]: 15728 (us)
# [MLU Interface Time     ]: 369.008 (us)
# [MLU IO Efficiency      ]: 0.23275
# [MLU Compute Efficiency ]: 0.5
# [Mlu Workspace Size     ]: -1 (Bytes)
# 
# ----------- case1 -----------
# ...

Platform：MLU590

# ----------- case0 -----------
# ----------- case1 -----------
# ...

3.4 Summary Analysis

Please give a brief overview here, if you want to note and summarize the content.

niyuming · 2025-02-26T08:44:00Z

docs/design_docs/fft/fft.md

+#### 3.1.2 任意长度bluestein fft实现
+#### 3.1.2.1 1d bluestein fft
+根据算法计算步骤 1.2.1.3 可知， 核心逻辑为对输入数据乘系数后得到a[n],进行pad后再进行fft，结果与另一个系数的fft 结果相乘再逆fft，结果再乘系数。
+其中关键步骤fft 及ifft 可直接调用前述fft 及ifft kernel 进行计算。需要实现的是1. x[n]*w[n], 复数矩阵每列乘以相应复数系数；2. fft(a_pad) * fft(h_pad), fft(h_pad)结果为1位复数向量, 其实质为也是一个复数矩每列阵乘以相应的复数系数;3. w[k]*ifft()也是计算复数矩阵每列乘以复数相应系数。这个三个计算步骤实质是相同的，所以实现完整的bluestein fft 算法除调用上述接口外还需要实现此功能, 命名接complex_coeff_matmul(),以及系数生成generate()


1.generate()系数占用空间多少，SRAM 肯定能放下吗？长度是 FFT 旋转因子的长度，还是全部的N？
2.所有核用的是相同的系数吧，然后是只有一个 ipu core 在 generate，还是每个核 generate 1/4，后面方式的性能应该更好些？
3.为什么不直接保存在 NRAM 上，是空间不够吗

计算中有三个需要系数的地方，都不一样，是想每次调用的时候计算，只需要调用一次，时间消耗应该不大，之前打算一个cluster 单ipu core 算完存在sram，sram 空间比较大 2M，应该够N 用了，这里因为是拼接算子，所以暂时不涉及原来fft 所需要的旋转因子。明哥提醒每个core generate 1/4 更合适，之前没看到有这个从非0 起的指令函数，刚找到了。
nram 上怕空间不够用，大点的N 就放不下了，例如4098 pad 到比较8192 更大的数会比较大，所以放sram

generate那个如果N超了2M有存在GDRAM上的备用方案吗，客户规模不一定有，但补充功能测例时应该不保准不会测到吧

docs/design_docs/fft/fft.md

DanieeelLiu added 4 commits February 25, 2025 15:29

[Feature](mluOpExecFFT): add bluestein fft

af88fed

[Feature](mluOpExecFFT): revise doc

af1ca50

[Feature](mluOpExecFFT): revise doc

c2e056b

[Feature](mluOpExecFFT): revise doc

693998a

niyuming reviewed Feb 26, 2025

View reviewed changes

docs/design_docs/fft/fft.md Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature](mluOpExecFFT): add bluestein fft #1213

[Feature](mluOpExecFFT): add bluestein fft #1213

DanieeelLiu commented Feb 25, 2025

niyuming Feb 26, 2025

DanieeelLiu Feb 26, 2025

niyuming Feb 27, 2025

[Feature](mluOpExecFFT): add bluestein fft #1213

Are you sure you want to change the base?

[Feature](mluOpExecFFT): add bluestein fft #1213

Conversation

DanieeelLiu commented Feb 25, 2025

1. Motivation

2. Modification

3. Test Report

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

3.1.2 Operator Scheme checklist

3.2 Accuracy Test

3.2.1 Accuracy Test

3.2.2 Parameter Check

3.3 Performance Test

3.4 Summary Analysis

niyuming Feb 26, 2025

Choose a reason for hiding this comment

DanieeelLiu Feb 26, 2025

Choose a reason for hiding this comment

niyuming Feb 27, 2025

Choose a reason for hiding this comment