Skip to content

Gh 5286 leftjoin skip rhs execution #5318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jasperiq
Copy link

@jasperiq jasperiq commented Apr 24, 2025

GitHub issue resolved: #5286

Briefly describe the changes proposed in this PR:
This PR skips the right hand side execution when an optional (left join) contains a filter that can be evaluated only with the left hand side and this evaluation returns false.

Benchmark results are added as separate comments


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@jasperiq
Copy link
Author

jasperiq commented Apr 24, 2025

This benchmark shows a significant improvement in query performance when the optional contains a filter that can be evaluated with only the left hand side and leads to false in most, but not all cases.

Before

# JMH version: 1.37
# VM version: JDK 21.0.2, OpenJDK 64-Bit Server VM, 21.0.2+13-58
# VM invoker: ...
# VM options: -Xms1G -Xmx1G
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.eclipse.rdf4j.sail.memory.benchmark.QueryBenchmark.optional_lhs_filter

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 44.299 ms/op
# Warmup Iteration   2: 49.413 ms/op
# Warmup Iteration   3: 40.414 ms/op
# Warmup Iteration   4: 39.858 ms/op
# Warmup Iteration   5: 40.088 ms/op
Iteration   1: 40.111 ms/op
Iteration   2: 39.964 ms/op
Iteration   3: 39.975 ms/op
Iteration   4: 40.017 ms/op
Iteration   5: 41.459 ms/op

Result "org.eclipse.rdf4j.sail.memory.benchmark.QueryBenchmark.optional_lhs_filter":
  40.305 ±(99.9%) 2.494 ms/op [Average]
  (min, avg, max) = (39.964, 40.305, 41.459), stdev = 0.648
  CI (99.9%): [37.811, 42.799] (assumes normal distribution)

# Run complete. Total time: 00:01:53

After

# JMH version: 1.37
# VM version: JDK 21.0.2, OpenJDK 64-Bit Server VM, 21.0.2+13-58
# VM invoker: ...
# VM options: -Xms1G -Xmx1G
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.eclipse.rdf4j.sail.memory.benchmark.QueryBenchmark.optional_lhs_filter

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 30.430 ms/op
# Warmup Iteration   2: 25.719 ms/op
# Warmup Iteration   3: 25.768 ms/op
# Warmup Iteration   4: 25.268 ms/op
# Warmup Iteration   5: 25.091 ms/op
Iteration   1: 26.675 ms/op
Iteration   2: 25.093 ms/op
Iteration   3: 25.191 ms/op
Iteration   4: 25.103 ms/op
Iteration   5: 25.188 ms/op

Result "org.eclipse.rdf4j.sail.memory.benchmark.QueryBenchmark.optional_lhs_filter":
  25.450 ±(99.9%) 2.643 ms/op [Average]
  (min, avg, max) = (25.093, 25.450, 26.675), stdev = 0.686
  CI (99.9%): [22.807, 28.093] (assumes normal distribution)

# Run complete. Total time: 00:01:53

@jasperiq
Copy link
Author

This benchmark shows there is no significant increase in query performance when the optional contains a filter that needs to be evaluated with the right hand side.

Before

# JMH version: 1.37
# VM version: JDK 21.0.2, OpenJDK 64-Bit Server VM, 21.0.2+13-58
# VM invoker: ...
# VM options: -Xms1G -Xmx1G
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.eclipse.rdf4j.sail.memory.benchmark.QueryBenchmark.optional_rhs_filter

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 37.231 ms/op
# Warmup Iteration   2: 33.445 ms/op
# Warmup Iteration   3: 32.654 ms/op
# Warmup Iteration   4: 32.740 ms/op
# Warmup Iteration   5: 35.782 ms/op
Iteration   1: 38.260 ms/op
Iteration   2: 32.231 ms/op
Iteration   3: 32.378 ms/op
Iteration   4: 32.350 ms/op
Iteration   5: 32.524 ms/op


Result "org.eclipse.rdf4j.sail.memory.benchmark.QueryBenchmark.optional_rhs_filter":
  33.549 ±(99.9%) 10.150 ms/op [Average]
  (min, avg, max) = (32.231, 33.549, 38.260), stdev = 2.636

After

# JMH version: 1.37
# VM version: JDK 21.0.2, OpenJDK 64-Bit Server VM, 21.0.2+13-58
# VM invoker: /Users/jaspervanmaanen/.sdkman/candidates/java/21.0.2-open/bin/java
# VM options: -Xms1G -Xmx1G
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.eclipse.rdf4j.sail.memory.benchmark.QueryBenchmark.optional_rhs_filter

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 38.455 ms/op
# Warmup Iteration   2: 43.332 ms/op
# Warmup Iteration   3: 35.756 ms/op
# Warmup Iteration   4: 33.169 ms/op
# Warmup Iteration   5: 33.242 ms/op
Iteration   1: 33.215 ms/op
Iteration   2: 33.178 ms/op
Iteration   3: 33.191 ms/op
Iteration   4: 39.994 ms/op
Iteration   5: 34.681 ms/op


Result "org.eclipse.rdf4j.sail.memory.benchmark.QueryBenchmark.optional_rhs_filter":
  34.852 ±(99.9%) 11.344 ms/op [Average]
  (min, avg, max) = (33.178, 34.852, 39.994), stdev = 2.946
  CI (99.9%): [23.508, 46.195] (assumes normal distribution)

@jasperiq jasperiq marked this pull request as ready for review May 6, 2025 12:44
@frensjan
Copy link
Contributor

@hmottestad, @JervenBolleman, would this be something we could incorporate in RDF4J? We have some queries that benefit from this optimisation.

@JervenBolleman JervenBolleman self-requested a review May 12, 2025 09:20
@jasperiq jasperiq force-pushed the GH-5286-leftjoin-skip-rhs-execution branch from 6cd5e45 to 273ed8e Compare May 12, 2025 11:37
Copy link
Contributor

@JervenBolleman JervenBolleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copyright year needs to be 2025


@Override
public CloseableIteration<BindingSet> evaluate(BindingSet leftBindings) {
var rightIteration = wrapped.evaluate(leftBindings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for myself: did we start using var in the code base?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with var. I've started using it where it makes sense. Best use case for me so far has been for connections or iterators that are wrapped in a try-with-resource, much easier to read.

@jasperiq jasperiq force-pushed the GH-5286-leftjoin-skip-rhs-execution branch from 273ed8e to db12fd1 Compare May 12, 2025 12:05
@jasperiq jasperiq requested a review from JervenBolleman May 12, 2025 12:05
@JervenBolleman
Copy link
Contributor

@jasperiq just waiting for the one comment, then I think it LGTM :) Thank you very much for your contribution and patience.

@jasperiq
Copy link
Author

@jasperiq just waiting for the one comment, then I think it LGTM :) Thank you very much for your contribution and patience.

That's fine! I'm under the assumption that I have completed the requested changes. If my assumption is incorrect and there is anything I need to do, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LeftJoin RHS evaluation optimization
4 participants