Join Operator (from old wiki) #3974
Closed
chenlica
started this conversation in
archived-wiki
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
From page https://github.com/apache/texera/wiki/Join-Operator (may be dangling)
====
Author: Sripad Kowshik Subramanyam
Synopsys
Implement an operator that takes two operators as the input and joins their tuples based on constraints specified using a predicate.
Status
As of 9/25/2016: COMPLETED
Modules
Related Issues
#111
Description
Join Operator performs the join of a certain field of the results of two other operators passed to it based on constraints specified in a join predicate. The field to join upon and the constraints to be satisfied are specified using
JoinPredicate. ThegetNextTuple()method is used to get the next result of the operator.Currently supported predicates are:
JoinDistancePredicate: Takes in an attribute that specifies the ID, the attribute of the field to perform the join on, and a distance threshold. If the distance between two spans of the field of the results to be joined is within the threshold, the join is performed.Example
Given below is a setting and corresponding examples to use
JoinDistancePredicate(consider the two tuples to be from two different operators)."us":<19, 22>
Where
<spanStartIndex, spanEndIndex>represents a span.If we want to join over the review attribute with the condition within 10 character distance, we can write:
JoinDistancePredicate joinPredicate = new JoinDistancePredicate(idAttr, reviewAttr, 10);Since both tuples have the same ID, we can perform the join on the two span lists.
The span distance is computed as:
|(span 1 spanStartIndex) - (span 2 spanStartIndex)| OR |(span 1 spanEndIndex) - (span 2 spanEndIndex)|)Upon performing Join on the above two tuples, we get:
The span
"book":<6,11>from tuple1 and the span"gives":<12, 18>from tuple2 satisfy the condition distance <= threshold. Therefore, the join will combine two spans into a new span"book_gives":<6, 18>.The span
"book":<6,11>from tuple1 and the span"us":<19, 22>from tuple2 don't satisfy the condition, so they will not be joined.TODOs
Beta Was this translation helpful? Give feedback.
All reactions