Skip to content

Optimize @, fixes #25063 #25064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from
Open

Optimize @, fixes #25063 #25064

wants to merge 1 commit into from

Conversation

yglukhov
Copy link
Member

No description provided.

@yglukhov yglukhov marked this pull request as draft July 22, 2025 13:15
@yglukhov yglukhov marked this pull request as ready for review July 22, 2025 17:44
@yglukhov
Copy link
Member Author

toml_serialization failure unrelated, and I can't reproduce it locally, and can't tell what's the problem. @arnetheduck maybe you could have a look?

@arnetheduck
Copy link
Contributor

#25066
status-im/nim-faststreams#76
should work now

@yglukhov
Copy link
Member Author

Rest of failures unrelated too, I think

@Araq
Copy link
Member

Araq commented Jul 23, 2025

I hate this shit. Just make the compiler emit copyMem for for loops instead of patching every single copy loop ever written.

@arnetheduck
Copy link
Contributor

Just make the compiler emit copyMem

ugh please no..

this is actually entirely solvable in library code - the problem is mainly that too much happens in the compiler already that is of poor quality and hard to address, and taking this shortcut means that you no longer can build custom data structures that benefit from the same optimizations. The right way here which would be extensible is to have primitives in the language and library that can be reused and remove many of the compiler magics (like ArrPut and so on)..

see also https://github.com/status-im/nim-stew/blob/35f935d0997724d3c61f57ab9fd0992e6f114f7f/stew/assign2.nim - ie this little snippet gets rid of most of bad performance in nim codegen - the biggest problem with it is that the compiler sneaks in calls to the compiler-generated = in lots of places which remains terribly slow and hard to fix.

The ideal scenario would be that assignment becomes pluggable the way assign2 works, and then we can implement library functions that call = for the right types.

This is also why we want setLenUninit and friends to work consistently and sanely - so that a Table and Deque can be written that is as efficient as seq and other magic compiler-based types.

@yglukhov
Copy link
Member Author

Library code is a lot easier to reason about than compiler code. If anything, I'd prefer to reduce the number of compiler magics/builtins, by moving them out to lib.

@Araq if you're concerned with this copying pattern recurring too much, we could just introduce a lib function to reuse it (e.g. template arrayCopy[T](dstArray, srcArray, dstIdx, srcIdx, len)), but I'm not aware of that many places to bother about it.

@arnetheduck
Copy link
Contributor

arnetheduck commented Jul 24, 2025

I'm not aware of that many places

=, []=, add(var seq, openarray), etc - there's plenty if you go looking for them - with the right primitives (setLenUnit, something like arrayCopy etc) they all boil down to the same reusable pattern though which is why this is so amenable to library solutions .. @ = setLenUninit + arrayCopy, []= is arrayCopy, add is setLenUninit + arrayCopy and so on and they tend to differ in whether they overwrite or extend things mainly.

the other big slowdown factor is that nim inserts range changes even for cases where this is not needed - ie for i in 0..<x.len: x[i] = y[i] cannot be optimized by the C compiler because it has to raise the defect at the right iteration after having copied the parts that fit - semantically, this of course doesn't matter for an array copy, but this is not something the C optimizer can deal with - sometimes it can be avoided by performing the length check explicitly before the loop which allows the compiler to make some assumptions in removing later range checks but this only works if aliasing information is available which it often is not (the range check rereads the length of the x and y on every iteration, and if aliasing is possible it cannot hoist this value from the loop and therefore cannot remove the in-loop check either).

There are plenty of these little nuggets that could be solved easily if we moved implementations out from the compiler and into the library, as they are much more easy to reason about there.

@Araq
Copy link
Member

Araq commented Jul 24, 2025

Range check elimination is also a complex compiler analysis, you don't get simplicity out of moving these things to libraries (how would that even work anyway for range checks elimination).

@yglukhov
Copy link
Member Author

Range check elimination is also a complex compiler analysis

Maybe something like {.push rangeChecks: off.}/{.pop.} in strategic places could help?

@arnetheduck
Copy link
Contributor

arnetheduck commented Jul 24, 2025

don't get simplicity out of moving these things

of course you do: you just disable it explicitly in your arrayCopy etc primitives without waiting for the complex compiler analysis to be written. When the compiler finally understands these fine points (I doubt it will), all you need to do is change the library function and everything that uses the smart primitives remains unchanged. This makes the implementations of add etc simple and the complexity remains concentrated in a single tightly controlled spot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants