Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized join-tuples by using aget and acopy where appropriate. #203

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wbrown
Copy link

@wbrown wbrown commented Dec 25, 2016

Summary

This commit optimizes join-tuples which is where Datascript spends a lot of its time, especially with larger queries. It is Clojure-focused, but may impact Clojurescript. It nearly triples the performance on Clojure, especially on large queries and tuples.

Merry Christmas!

Changes

The following optimizations were made:

  • If a tuple is an array, use aget rather than get, and we use typed-aget to avoid reflection.
  • If a tuple is an array, and is the same alength as the array of indices idx1/idx2, we do an arraycopy rather than iterating over the loop with aget and aset.

Performance

Merge one relation of 299K tuples into 10K tuples, and then merging another 299K tuples into it to produce 4mm tuples. Measured using criterium.

Before

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 8.385470 sec
    Execution time std-deviation : 262.201023 ms
   Execution time lower quantile : 8.123374 sec ( 2.5%)
   Execution time upper quantile : 8.685724 sec (97.5%)
                   Overhead used : 1.773439 ns

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 8.689327 sec
    Execution time std-deviation : 164.748363 ms
   Execution time lower quantile : 8.533485 sec ( 2.5%)
   Execution time upper quantile : 8.959782 sec (97.5%)
                   Overhead used : 1.773439 ns

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 8.619953 sec
    Execution time std-deviation : 216.090338 ms
   Execution time lower quantile : 8.377609 sec ( 2.5%)
   Execution time upper quantile : 8.904253 sec (97.5%)
                   Overhead used : 1.773439 ns

After

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 2.936135 sec
    Execution time std-deviation : 110.688676 ms
   Execution time lower quantile : 2.836019 sec ( 2.5%)
   Execution time upper quantile : 3.107188 sec (97.5%)
                   Overhead used : 1.737522 ns

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 2.999609 sec
    Execution time std-deviation : 121.434106 ms
   Execution time lower quantile : 2.850762 sec ( 2.5%)
   Execution time upper quantile : 3.154034 sec (97.5%)
                   Overhead used : 1.737522 ns

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 2.960817 sec
    Execution time std-deviation : 61.025659 ms
   Execution time lower quantile : 2.900257 sec ( 2.5%)
   Execution time upper quantile : 3.028761 sec (97.5%)
                   Overhead used : 1.737522 ns

Notes

I had various other optimizations, but they sacrificed code simplicity and clarity for relatively marginal improvement in performance.

One implementation used a pair of closured functions, in the theory that there was some overhead in setting up join-tuples each time. It actually slowed things down in some cases, and complicated things in that there was not an atomic join-tuples call. However, I learned that there was potential optimization in binding input values into a more tightly scoped let block.

(defn tuple-copy-fn
  ([keep-idxs tuples]
    (tuple-copy-fn keep-idxs tuples 0))
  ([#?(:cljs keep-idxs
       :clj ^{:tag "[[Ljava.lang.Object;"} keep-idxs) tuples offset]
   (let [offset    (int offset)
         keep-idxs (da/aclone keep-idxs)
         li        (da/alength keep-idxs)
         t         (first tuples)
         t-array?  (da/array? t)
         t-acopy?  (and t-array?
                        (= li (alength #?(:cljs t
                                          :clj  ^{:tag "[[Ljava.lang.Object;"}
                                                t))))
         getter    (if t-array? typed-agetter get)]
     (if t-acopy?
       (fn [tuple target-array]
         (da/acopy tuple 0 li
                   target-array offset))
       (fn [tuple target-array]
         (dotimes [i li]
           (da/aset target-array (+ offset i)
                 (getter tuple (da/aget keep-idxs i)))))))))

(defn apply-tuple-copy-fn
  [f1 t1 f2 t2 arr-size]
  (let [res (da/make-array arr-size)]
    (f1 t1 res)
    (f2 t2 res)
    res))

This change is Reviewable

@tonsky
Copy link
Owner

tonsky commented Mar 1, 2017

You’re relying on a fact that if length of a tuple is equal to the length of the index then we’re copying full tuple, keeping its layout. It is accidentally true for small tuples (those who use <8 attributes, thus array-map for :attrs, it automatically keeps order). If we hit 9+ attributes, (keys (:attrs rel1)) would return arbitrary ordering and layout of :attrs and actual layout of resulting tuples would be different.

In theory, it’d be preferable to keep layout, and if the code would work that way your assumption would be true. If you figure out a way to get there, it’ll be great. If you can work around it, not relying on that fact, it might be great too.

@wbrown-lg
Copy link

@tonsky Interesting point, I hadn't realized that. What is the impact of layout on a tuple join? I hadn't realized that there was an implicit requirement, which I will try to state clearly:

  • When joining two tuples, the output tuple ordering must match the given :attrs.

Do we have a test case for this scenario?

Let me think on how to solve this. I've actually never run into this scenario on my particularly heavy use case. :)

@tonsky
Copy link
Owner

tonsky commented Mar 6, 2017 via email

@wbrown-lg
Copy link

@tonsky So, I was thinking about it. We could put a conditional that measures the number of attributes given. If l1 or l2 exceeds 8, then we set acopy? to false.

This gives us:

  • The performance advantage of an acopy for a lot of scenarios.
  • Predictable attribute ordering.

But this relies on a certain guarantee:

  • That the tuples in question are always array-maps produced by Clojure when <8 attributes.

Are we comfortable with that?

@tonsky
Copy link
Owner

tonsky commented Mar 11, 2017

We could put a conditional that measures the number of attributes given. If l1 or l2 exceeds 8

No, this is too unreliable. CLJ/CLJS versions might change that parameter without a warning. It’s also wouldn’t be obvious from the code what are we relying on and why. We should use correct data structures

@wbrown-lg
Copy link

@tonsky Fair points about not relying on implementation details of the platform.

I've been examining and thinking about the logic here:

  • We do an acopy if and only if both following conditions match:
    • Tuple t is an array.
    • The length of the array of indexes, idx matches the length of tuple t.

So, we only do an acopy if and only if the tuple is an array, so at this point, the ordering of a hash-map isn't a concern.

In what situations would join-tuples receive an array tuple t where the array of indexes, idx, would differ in ordering from tuple t when the array of indexes, idx matches the length of the array tuple?

I think that if we clarified that use case and functional description, we can probably solve this.

Alternatively, I can remove the acopy entirely, but it would reduce the performance gains tremendously.

@wbrown-lg
Copy link

@tonsky I expanded the scope of my thinking and examined the Relation records. Your comment about using 'proper data structures' nudged me along that path.

Couple suggestions:

  • array-map ordering is guaranteed at creation time, regardless of number of key-values. It is when you assoc new key-values onto the array-map that the ordering is not guaranteed.
    • Could we alter the Relation record to require an array-map?
  • Use a different data-structure to represent the attribute to index associations, perhaps a deftype?

@tonsky tonsky force-pushed the master branch 3 times, most recently from ada9497 to 130e06f Compare April 9, 2017 18:21
@wbrown-lg
Copy link

@tonsky Would you be OK with introducing frankiesardo/linked? It's a fairly nice ordered map implementation for both CLJ and CLJS.

https://github.com/frankiesardo/linked

@tonsky
Copy link
Owner

tonsky commented Jun 29, 2017

@wbrown-lg seems like an overkill to pull in a whole lib of fully implemented persistent data structure for such a small task. Can’t we build it using vectors for example? (if I remember what this PR was about correctly)

@wbrown-lg
Copy link

@tonsky We could. Rather than use a hash map to contain the relation to key index, we could use a vector instead in :attrs. Unless we're joining very large numbers of attributes in tuples, it should still perform pretty well. If we need efficiency, we could build a lookup hash map inside the function itself.

@tonsky
Copy link
Owner

tonsky commented Jun 29, 2017

Right. I mean, it will have the same efficiency even, since array map lookups are same linear scans. Maybe we should rather use native arrays insted, might be even faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants