-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized join-tuples
by using aget
and acopy
where appropriate.
#203
base: master
Are you sure you want to change the base?
Conversation
You’re relying on a fact that if length of a tuple is equal to the length of the index then we’re copying full tuple, keeping its layout. It is accidentally true for small tuples (those who use <8 attributes, thus array-map for :attrs, it automatically keeps order). If we hit 9+ attributes, In theory, it’d be preferable to keep layout, and if the code would work that way your assumption would be true. If you figure out a way to get there, it’ll be great. If you can work around it, not relying on that fact, it might be great too. |
@tonsky Interesting point, I hadn't realized that. What is the impact of layout on a tuple join? I hadn't realized that there was an implicit requirement, which I will try to state clearly:
Do we have a test case for this scenario? Let me think on how to solve this. I've actually never run into this scenario on my particularly heavy use case. :) |
When joining two tuples, the output tuple ordering must match the given
:attrs
This is correct to some extent: we don’t _need_ that, but it would be nice
and sane to have that. Prob. more efficient too (my guess)
On Mon, Mar 6, 2017 at 5:01 PM Wes Brown <[email protected]> wrote:
@tonsky <https://github.com/tonsky> Interesting point, I hadn't realized
that. What is the impact of layout on a tuple join? I hadn't realized that
there was an implicit requirement, which I will try to state clearly:
- When joining two tuples, the output tuple ordering must match the
given :attrs.
Do we have a test case for this scenario?
Let me think on how to solve this. I've actually never run into this
scenario on my particularly heavy use case. :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#203 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AARabLgiWWdFgANcSrakKq0kEQHwNLmeks5rjB_IgaJpZM4LVdXe>
.
|
@tonsky So, I was thinking about it. We could put a conditional that measures the number of attributes given. If This gives us:
But this relies on a certain guarantee:
Are we comfortable with that? |
No, this is too unreliable. CLJ/CLJS versions might change that parameter without a warning. It’s also wouldn’t be obvious from the code what are we relying on and why. We should use correct data structures |
@tonsky Fair points about not relying on implementation details of the platform. I've been examining and thinking about the logic here:
So, we only do an In what situations would I think that if we clarified that use case and functional description, we can probably solve this. Alternatively, I can remove the |
@tonsky I expanded the scope of my thinking and examined the Couple suggestions:
|
ada9497
to
130e06f
Compare
@tonsky Would you be OK with introducing |
@wbrown-lg seems like an overkill to pull in a whole lib of fully implemented persistent data structure for such a small task. Can’t we build it using vectors for example? (if I remember what this PR was about correctly) |
@tonsky We could. Rather than use a hash map to contain the relation to key index, we could use a vector instead in |
Right. I mean, it will have the same efficiency even, since array map lookups are same linear scans. Maybe we should rather use native arrays insted, might be even faster |
Summary
This commit optimizes
join-tuples
which is where Datascript spends a lot of its time, especially with larger queries. It is Clojure-focused, but may impact Clojurescript. It nearly triples the performance on Clojure, especially on large queries and tuples.Merry Christmas!
Changes
The following optimizations were made:
array
, useaget
rather thanget
, and we usetyped-aget
to avoid reflection.array
, and is the samealength
as the array of indicesidx1
/idx2
, we do anarraycopy
rather than iterating over the loop withaget
andaset
.Performance
Merge one relation of 299K tuples into 10K tuples, and then merging another 299K tuples into it to produce 4mm tuples. Measured using criterium.
Before
After
Notes
I had various other optimizations, but they sacrificed code simplicity and clarity for relatively marginal improvement in performance.
One implementation used a pair of closured functions, in the theory that there was some overhead in setting up
join-tuples
each time. It actually slowed things down in some cases, and complicated things in that there was not an atomicjoin-tuples
call. However, I learned that there was potential optimization in binding input values into a more tightly scopedlet
block.This change is