Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Math: Inline function sofm_lut_sin_fixed_16b() for performance #9798

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

singalsu
Copy link
Collaborator

This patch inlines the function sofm_lut_sin_fixed_16b() and moves it to header file lut_trig.h. The lookup table is kept in lut_trig.c and made global.

The DRC component use a lot the sine function (the fast lookup tables version). The function seems to not get improvement from HiFi intrinsics rewrite but making it inline improves DRC performance by 0.54 MCPS, from 12.62 MCPS to 12.08 MCPS.

@singalsu singalsu requested a review from lyakh January 27, 2025 11:14
This patch inlines the function sofm_lut_sin_fixed_16b() and
moves it to header file lut_trig.h. The lookup table is kept
in lut_trig.c and made global.

The DRC component use a lot the sine function (the fast lookup
tables version). The function seems to not get improvement from
HiFi intrinsics rewrite but making it inline improves DRC
performance in MTL platform by 0.54 MCPS, from 12.62 MCPS to
12.08 MCPS.

In Multiband-DRC the saving multiplies by number of bands, e.g.
1.58 MCPS saving with three bands.

Signed-off-by: Seppo Ingalsuo <[email protected]>
@singalsu singalsu force-pushed the inline_lut_sine_function branch from 7bc3b56 to 5412a28 Compare January 27, 2025 13:15
@singalsu singalsu marked this pull request as ready for review January 27, 2025 13:16
delta = s1 - s0; /* Q1.16 */
sine = s0 + q_mults_32x32(frac, delta, Q_SHIFT_BITS_64(31, 16, 16)); /* Q1.16 */
return sat_int16((sine + 1) >> 1); /* Round to Q1.15 */
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this every call to sofm_lut_sin_fixed_16b() inlines sofm_sine_lookup_16b() twice. The former is called from drc_sin_fixed(), which is also an inline function in drc_math.h. That one is called from C code 4 times from C, HiFi3 and HiFi4 DRC versions. So that should make the resulting image (or the DRC module) somewhat larger. @singalsu have you compared sizes? You could also try to only inline one of them, wondering how much performance improvement would that give. Also, you could convert lines 55-56 to a 2-iteration loop, which would reduce the size a bit, unless the compiler decides to unroll that loop.
In general, I'd guess, that we could make similar performance improvements by identifying and moving to headers all the functions, called when processing data

@singalsu singalsu marked this pull request as draft January 28, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants