Optimized `join-tuples` by using `aget` and `acopy` where appropriate. #203

wbrown · 2016-12-25T19:41:24Z

Summary

This commit optimizes join-tuples which is where Datascript spends a lot of its time, especially with larger queries. It is Clojure-focused, but may impact Clojurescript. It nearly triples the performance on Clojure, especially on large queries and tuples.

Merry Christmas!

Changes

The following optimizations were made:

If a tuple is an array, use aget rather than get, and we use typed-aget to avoid reflection.
If a tuple is an array, and is the same alength as the array of indices idx1/idx2, we do an arraycopy rather than iterating over the loop with aget and aset.

Performance

Merge one relation of 299K tuples into 10K tuples, and then merging another 299K tuples into it to produce 4mm tuples. Measured using criterium.

Before

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 8.385470 sec
    Execution time std-deviation : 262.201023 ms
   Execution time lower quantile : 8.123374 sec ( 2.5%)
   Execution time upper quantile : 8.685724 sec (97.5%)
                   Overhead used : 1.773439 ns

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 8.689327 sec
    Execution time std-deviation : 164.748363 ms
   Execution time lower quantile : 8.533485 sec ( 2.5%)
   Execution time upper quantile : 8.959782 sec (97.5%)
                   Overhead used : 1.773439 ns

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 8.619953 sec
    Execution time std-deviation : 216.090338 ms
   Execution time lower quantile : 8.377609 sec ( 2.5%)
   Execution time upper quantile : 8.904253 sec (97.5%)
                   Overhead used : 1.773439 ns

After

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 2.936135 sec
    Execution time std-deviation : 110.688676 ms
   Execution time lower quantile : 2.836019 sec ( 2.5%)
   Execution time upper quantile : 3.107188 sec (97.5%)
                   Overhead used : 1.737522 ns

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 2.999609 sec
    Execution time std-deviation : 121.434106 ms
   Execution time lower quantile : 2.850762 sec ( 2.5%)
   Execution time upper quantile : 3.154034 sec (97.5%)
                   Overhead used : 1.737522 ns

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 2.960817 sec
    Execution time std-deviation : 61.025659 ms
   Execution time lower quantile : 2.900257 sec ( 2.5%)
   Execution time upper quantile : 3.028761 sec (97.5%)
                   Overhead used : 1.737522 ns

Notes

I had various other optimizations, but they sacrificed code simplicity and clarity for relatively marginal improvement in performance.

One implementation used a pair of closured functions, in the theory that there was some overhead in setting up join-tuples each time. It actually slowed things down in some cases, and complicated things in that there was not an atomic join-tuples call. However, I learned that there was potential optimization in binding input values into a more tightly scoped let block.

(defn tuple-copy-fn
  ([keep-idxs tuples]
    (tuple-copy-fn keep-idxs tuples 0))
  ([#?(:cljs keep-idxs
       :clj ^{:tag "[[Ljava.lang.Object;"} keep-idxs) tuples offset]
   (let [offset    (int offset)
         keep-idxs (da/aclone keep-idxs)
         li        (da/alength keep-idxs)
         t         (first tuples)
         t-array?  (da/array? t)
         t-acopy?  (and t-array?
                        (= li (alength #?(:cljs t
                                          :clj  ^{:tag "[[Ljava.lang.Object;"}
                                                t))))
         getter    (if t-array? typed-agetter get)]
     (if t-acopy?
       (fn [tuple target-array]
         (da/acopy tuple 0 li
                   target-array offset))
       (fn [tuple target-array]
         (dotimes [i li]
           (da/aset target-array (+ offset i)
                 (getter tuple (da/aget keep-idxs i)))))))))

(defn apply-tuple-copy-fn
  [f1 t1 f2 t2 arr-size]
  (let [res (da/make-array arr-size)]
    (f1 t1 res)
    (f2 t2 res)
    res))

This change is

tonsky · 2017-03-01T15:54:06Z

You’re relying on a fact that if length of a tuple is equal to the length of the index then we’re copying full tuple, keeping its layout. It is accidentally true for small tuples (those who use <8 attributes, thus array-map for :attrs, it automatically keeps order). If we hit 9+ attributes, (keys (:attrs rel1)) would return arbitrary ordering and layout of :attrs and actual layout of resulting tuples would be different.

In theory, it’d be preferable to keep layout, and if the code would work that way your assumption would be true. If you figure out a way to get there, it’ll be great. If you can work around it, not relying on that fact, it might be great too.

wbrown-lg · 2017-03-06T15:01:24Z

@tonsky Interesting point, I hadn't realized that. What is the impact of layout on a tuple join? I hadn't realized that there was an implicit requirement, which I will try to state clearly:

When joining two tuples, the output tuple ordering must match the given :attrs.

Do we have a test case for this scenario?

Let me think on how to solve this. I've actually never run into this scenario on my particularly heavy use case. :)

tonsky · 2017-03-06T15:22:34Z

When joining two tuples, the output tuple ordering must match the given

:attrs This is correct to some extent: we don’t _need_ that, but it would be nice and sane to have that. Prob. more efficient too (my guess) On Mon, Mar 6, 2017 at 5:01 PM Wes Brown <[email protected]> wrote: @tonsky <https://github.com/tonsky> Interesting point, I hadn't realized that. What is the impact of layout on a tuple join? I hadn't realized that there was an implicit requirement, which I will try to state clearly: - When joining two tuples, the output tuple ordering must match the given :attrs. Do we have a test case for this scenario? Let me think on how to solve this. I've actually never run into this scenario on my particularly heavy use case. :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#203 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AARabLgiWWdFgANcSrakKq0kEQHwNLmeks5rjB_IgaJpZM4LVdXe> .

wbrown-lg · 2017-03-10T21:55:26Z

@tonsky So, I was thinking about it. We could put a conditional that measures the number of attributes given. If l1 or l2 exceeds 8, then we set acopy? to false.

This gives us:

The performance advantage of an acopy for a lot of scenarios.
Predictable attribute ordering.

But this relies on a certain guarantee:

That the tuples in question are always array-maps produced by Clojure when <8 attributes.

Are we comfortable with that?

tonsky · 2017-03-11T10:25:40Z

We could put a conditional that measures the number of attributes given. If l1 or l2 exceeds 8

No, this is too unreliable. CLJ/CLJS versions might change that parameter without a warning. It’s also wouldn’t be obvious from the code what are we relying on and why. We should use correct data structures

wbrown-lg · 2017-03-11T15:59:28Z

@tonsky Fair points about not relying on implementation details of the platform.

I've been examining and thinking about the logic here:

We do an acopy if and only if both following conditions match:
- Tuple t is an array.
- The length of the array of indexes, idx matches the length of tuple t.

So, we only do an acopy if and only if the tuple is an array, so at this point, the ordering of a hash-map isn't a concern.

In what situations would join-tuples receive an array tuple t where the array of indexes, idx, would differ in ordering from tuple t when the array of indexes, idx matches the length of the array tuple?

I think that if we clarified that use case and functional description, we can probably solve this.

Alternatively, I can remove the acopy entirely, but it would reduce the performance gains tremendously.

wbrown-lg · 2017-03-11T16:18:18Z

@tonsky I expanded the scope of my thinking and examined the Relation records. Your comment about using 'proper data structures' nudged me along that path.

Couple suggestions:

array-map ordering is guaranteed at creation time, regardless of number of key-values. It is when you assoc new key-values onto the array-map that the ordering is not guaranteed.
- Could we alter the Relation record to require an array-map?
Use a different data-structure to represent the attribute to index associations, perhaps a deftype?

wbrown-lg · 2017-06-28T20:53:33Z

@tonsky Would you be OK with introducing frankiesardo/linked? It's a fairly nice ordered map implementation for both CLJ and CLJS.

https://github.com/frankiesardo/linked

tonsky · 2017-06-29T18:01:05Z

@wbrown-lg seems like an overkill to pull in a whole lib of fully implemented persistent data structure for such a small task. Can’t we build it using vectors for example? (if I remember what this PR was about correctly)

wbrown-lg · 2017-06-29T19:05:59Z

@tonsky We could. Rather than use a hash map to contain the relation to key index, we could use a vector instead in :attrs. Unless we're joining very large numbers of attributes in tuples, it should still perform pretty well. If we need efficiency, we could build a lookup hash map inside the function itself.

tonsky · 2017-06-29T19:33:08Z

Right. I mean, it will have the same efficiency even, since array map lookups are same linear scans. Maybe we should rather use native arrays insted, might be even faster

Optimized join-tuples by using aget and acopy where appropriate.

d668a01

tonsky force-pushed the master branch 3 times, most recently from ada9497 to 130e06f Compare April 9, 2017 18:21

tonsky force-pushed the master branch from 4c11a56 to 0a40283 Compare May 31, 2021 23:29

tonsky force-pushed the master branch from 755a00d to 8e203f7 Compare June 18, 2024 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized `join-tuples` by using `aget` and `acopy` where appropriate. #203

Optimized `join-tuples` by using `aget` and `acopy` where appropriate. #203

wbrown commented Dec 25, 2016 •

edited by tonsky

Loading

tonsky commented Mar 1, 2017

wbrown-lg commented Mar 6, 2017

tonsky commented Mar 6, 2017 via email

wbrown-lg commented Mar 10, 2017

tonsky commented Mar 11, 2017

wbrown-lg commented Mar 11, 2017

wbrown-lg commented Mar 11, 2017

wbrown-lg commented Jun 28, 2017

tonsky commented Jun 29, 2017

wbrown-lg commented Jun 29, 2017

tonsky commented Jun 29, 2017

Optimized join-tuples by using aget and acopy where appropriate. #203

Are you sure you want to change the base?

Optimized join-tuples by using aget and acopy where appropriate. #203

Conversation

wbrown commented Dec 25, 2016 • edited by tonsky Loading

Summary

Changes

Performance

Notes

tonsky commented Mar 1, 2017

wbrown-lg commented Mar 6, 2017

tonsky commented Mar 6, 2017 via email

wbrown-lg commented Mar 10, 2017

tonsky commented Mar 11, 2017

wbrown-lg commented Mar 11, 2017

wbrown-lg commented Mar 11, 2017

wbrown-lg commented Jun 28, 2017

tonsky commented Jun 29, 2017

wbrown-lg commented Jun 29, 2017

tonsky commented Jun 29, 2017

Optimized `join-tuples` by using `aget` and `acopy` where appropriate. #203

Optimized `join-tuples` by using `aget` and `acopy` where appropriate. #203

wbrown commented Dec 25, 2016 •

edited by tonsky

Loading