Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio: Crossover / Multiband DRC: Change to HiFi4 and HiFi5 optimized IIR DF1 core type #9806

Merged
merged 2 commits into from
Feb 3, 2025

Conversation

singalsu
Copy link
Collaborator

No description provided.

The direct-form I (DF1) is compatible with direct-form-transposed
(DF2T). The filter type is changed since DF1 is better potential
for optimization for SIMD.

In a HiFi5 platform this change saves with two band crossover
0.8 MCPS, from 10.02 MCPS to 9.26 MCPS. The saving will be higher
in higher order filter banks such as in multiband DRC component.

Signed-off-by: Seppo Ingalsuo <[email protected]>
The direct-form I (DF1) is compatible with direct-form-transposed
(DF2T). The filter type is changed since DF1 is better potential
for optimization for SIMD.

In a build for a HiFi5 platform this and previous patch
for crossover filterbank gives with three bands DRC a saving
of 6.1 MCPS, from 96.5 MCPS to 90.4 MCPS.

Signed-off-by: Seppo Ingalsuo <[email protected]>
@singalsu singalsu marked this pull request as ready for review January 31, 2025 17:01
@singalsu
Copy link
Collaborator Author

singalsu commented Feb 3, 2025

Note: The optimization will continue with the IIR DF1 core. I'm thinking to add a conversion for the coefficients blob to 128 bits load compatible to make it more efficient. The blobs in user space would remain the same.

Copy link
Contributor

@johnylin76 johnylin76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@singalsu
Copy link
Collaborator Author

singalsu commented Feb 3, 2025

Note: The optimization will continue with the IIR DF1 core. I'm thinking to add a conversion for the coefficients blob to 128 bits load compatible to make it more efficient. The blobs in user space would remain the same.

Here's the top 10 results from profiling before/after. The IIR still remains the top function:

Pipeline MCPS:  96.50

Flat profile:
                                           self      total          
       cumulative       self             cycles     cycles          
  %        cycles     cycles    calls     /call      /call  name    
             (K)        (K)                (K)        (K)           
39.01     54861.72   54861.72  1096720      0.05       0.05  iir_df2t
10.12     69090.62   14228.90     1430      9.95      94.57  multiband_drc_s32_default
 7.51     79653.12   10562.50     6426      1.64       2.71  drc_update_detector_average
 7.42     90084.45   10431.32     6429      1.62       2.65  drc_compress_output
 7.28    100327.70   10243.26   205635      0.05       0.24  multiband_drc_s32_process_drc
 5.41    107936.20    7608.49    68545      0.11       0.91  multiband_drc_process_emp_crossover
 4.97    114927.79    6991.59   137090      0.05       0.35  crossover_generic_split_3way
 4.68    121511.08    6583.30   205728      0.03       0.03  sofm_lut_sin_fixed_16b
 3.56    126514.62    5003.54    35834      0.14       0.14  sofm_exp_int32
 2.24    129670.89    3156.27   139259      0.02       0.02  memcpy

Pipeline MCPS:  90.36

Flat profile:
                                           self      total          
       cumulative       self             cycles     cycles          
  %        cycles     cycles    calls     /call      /call  name    
             (K)        (K)                (K)        (K)           
34.95     46087.96   46087.96  1096720      0.04       0.04  iir_df1
10.79     60316.86   14228.90     1430      9.95      88.44  multiband_drc_s32_default
 8.01     70879.36   10562.50     6426      1.64       2.71  drc_update_detector_average
 7.91     81310.68   10431.32     6429      1.62       2.65  drc_compress_output
 7.77     91553.94   10243.26   205635      0.05       0.24  multiband_drc_s32_process_drc
 5.77     99162.44    7608.49    68545      0.11       0.80  multiband_drc_process_emp_crossover
 5.30    106154.03    6991.59   137090      0.05       0.30  crossover_generic_split_3way
 4.99    112737.32    6583.30   205728      0.03       0.03  sofm_lut_sin_fixed_16b
 3.79    117740.86    5003.54    35834      0.14       0.14  sofm_exp_int32
 2.39    120897.13    3156.27   139259      0.02       0.02  memcpy

This was done with script run scripts/sof-testbench-helper.sh -x -m drc_multiband -p profile-drc32_multiband.txt

Edit: Next step is simplified IIR core for crossover_generic_process_lr4() function, the checks and outer loop can be removed for a fixed 4th order (2 biquads) calculate.

@kv2019i kv2019i merged commit f3ac6ed into thesofproject:main Feb 3, 2025
44 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants