-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if sin/cos approximation via integers has sufficient accuracy and speed #31
Comments
This is interesting, but I'm wary because any variation in the math could slightly change the physics of the game. Maybe it can be added as a compile-time option if it provides a meaningful performance boost. |
From what I understand sin/cos are used in other functions besides physics, so that could be looked at as well. Though I think the current implementation is already different from what the N64 does, since that supposedly uses a lookup table. as long as cos^2 +sin^2=1 is kept it can probably be used for graphical functions and small things like rotations. It's also helpful that you could compute sin/cos at the same time and swap them as needed, with the other being done using a square root. That being said, doing it via the preprocessor as you suggested seems like the best of both worlds. It's technically also possible to increase the order of the polynomial, but I think it'll hit the limits of what integers can hold very fast, and I think in that case using a sine may be better as it reduces the degree of the polynomial approximation (4th degree vs third degree). Basically as long as you can find a good rational approximation for the coefficients of the polynomial, you can then multiply by the denominator. |
I wrote some code for finding the best rational approximation for the second order cosine polynomial with the condition that the best possible accuracy should be at pi/4 (you can get something like an order of magnitude better overall accuracy if you allow
output:
Also played around with doing a curve fit using chebyshev nodes, you can get about half the worst case error everywhere at the cost of pi/4 being the worst case. I used the form
The price you pay for 1 less multiply is that the first endpoint at x=0 is not an exact match anymore and is instead Edit: I just realized, but in this form the value of the denominator actually doesnt matter at all. I can probably go 1 order higher and see if I can fit an additional term in there. |
sm64/include/trig_tables.inc.c Line 261 in a4b5e94
Found the table, I'll test against this for accuracy. As for speed it probably depends on how fast loading a value from the table is vs computing the value, and that speed probably depends on whether the table is in the cache or in the main ram. Probably cant test that without compiling the project on an actual machine. Can you tell me if the table is in the main ram or in the smaller cache? Is the cosine function using floats that I linked in my first post used for the actual game or do you mostly use the interpolation table? I'm not 100 % sure why there are two implementations. |
Ok so the table size is 4096 times 4 bytes for a float, so like 16 KB , and the cache is like 32 KB. The table is not going to be cached unless you dont need cache for anything else. It could be checked if doing the hardware squareroot instruction using an already read cosine (or sine) to get the counterpart sine/cosine via cos^2+sin^=1 is faster than loading them from both tables. |
I didn't write the game code, so I don't really know what's going on with it. I did the DS-specific stuff, and everything else was reverse-engineered from the N64 version to produce a 1:1 matching ROM. Sometimes the code ends up not using things or being weird in order to achieve that perfect match. As for the cache, if you're referring to TCM, I don't think it's used for anything at the moment. There's still the regular cache, which applies to everything but has the potential for cache misses. |
Seems like it might be useful https://www.nullhardware.com/blog/fixed-point-sine-and-cosine-for-embedded-systems/ |
http://www.coranac.com/2009/07/sines/ This has a section with assembly for nds/arm9 and includes benchmarks run on the nds. |
It's possibly to compute a good approximation to the cosine of a number using a single float multiplication and 1 integer multiplication.
The implementation from this video for example
https://www.youtube.com/watch?v=hffgNRfL1XY
(see e.g. 8:02 )
can be modified to use only integers except for the very last step.
Example:
the compiler optimizes this a bit to turn 2 integer multiplies +1 float into 1 integer multiplications and 1 float multiplication.
There are other possible values for a that can be chosen, potentially with less error than the one I used for this example, especially if the range of x is changed. The float multiply could also be replaced with a hardware divide if the angle doesnt have to be a float.
The current implementation seems to use a 8th order polynomial with doubles.
sm64/lib/src/math/cosf.c
Line 36 in a4b5e94
I'm also a bit confused as to why p[0] seems to be unused here.
The text was updated successfully, but these errors were encountered: