Skip to content

Implement Single Join #16425

@jonathanc-n

Description

@jonathanc-n

Is your feature request related to a problem or challenge?

This is part of #13181 for looking into different joins.

What is a Single Join

Single joins are similar to a regular left outer join however the build side can only have zero or one match with the probe side. If there are two or matches for one build side row, it will throw an error during runtime. The reason we need a separate join type for this is because other join types do not keep track of these number of matches.

What is it used for

Single joins are used to optimize scalar subqueries. In Neumann's paper for unnesting subqueries (here), he writes on dependent joins, which is just a join where every iteration of the left side will incur a search on the right side (right side dependent on left).

Here is an example of a scalar subquery:

SELECT
  e.name,
  (
    SELECT d.dept_name
    FROM departments d
    WHERE d.dept_id = e.dept_id
  ) AS department
FROM employees e;

Scalar subqueries act as a type of dependent join due to the subquery needing to be calculated for every row on the left. We can eliminate this by creating a hash table out of the subquery side and probing with the build side. This cuts the O(n^2) runtime to O(n). If the build side matches on more than two rows we call an error as it would be an invalid scalar subquery.

Describe the solution you'd like

  • Optimize scalar subqueries into single joins
  • Using the existing hash join execution plan we can create a path for single join (duckdb does something similar to this)
  • Do checks on each probe to see if it matched multiple rows; call execution error if it did.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions