Skip to content

Support joins on prefixes of arbitrary length #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

ecstatic-morse
Copy link
Contributor

@ecstatic-morse ecstatic-morse commented Jan 6, 2022

Currently, datafrog requires that all joins occur on tuples of the form (K, V). This becomes unpleasant for relations with more than two elements, as well as for joins on several elements. I found this limitation annoying when first trying to write datafrog programs.

Because datafrog is implemented on top of sorted data structures, we should be able to join on arbitrarily many tuple elements, so long as they are in order at the front of the tuple. This PR implements joins on arbitrary prefixes. As a result, you can specify all your variables and relations as a flat tuple, and wait to select what prefix you want for each individual join. For example, (A, B, C) could be joined with both (A, B, X) using (A, B) as the shared prefix and with (A, Y) using A as the shared prefix without cloning it.

Unfortunately, this is a breaking change, since we must copy prefixes and suffixes now instead of passing references to K and V.

@ecstatic-morse ecstatic-morse marked this pull request as draft January 6, 2022 08:44
@@ -69,41 +69,41 @@ fn main() {
let requires = iteration2.variable::<(Region, Borrow, Point)>("requires");
requires.insert(Vec::new().into());

let requires_rp = iteration2.variable::<((Region, Point), Borrow)>("requires_rp");
let requires_bp = iteration2.variable::<((Borrow, Point), Region)>("requires_bp");
let requires_rp = iteration2.variable::<(Region, Point, Borrow)>("requires_rp");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are redundant now, although this is partly because it was written without leapjoin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Frank initially had these more like documentation of how to use the API than really optimal, runnable examples. There are docs and doctests for leapjoins IIRC so it's not terrible that this example doesn't use them. Don't feel obligated to update them.

We could have that as a "good first issue" "help wanted" issue if some contributor wanted to try and clean that up. It would teach them a bit about the API, contrast the example with the polonius analyses, and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better example might switch requires to have the same layout as requires_rp or requires_bp and reorder it at the end. Without that and without leapjoin (which I agree doesn't really belong here), I think the current version is optimal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess maybe having a single iteration instead of two, or actually feeding input facts into the computation, and so on, could turn it into a slightly more realistic example if someone wanted to do so.

As for the other example, graspan1, I think there were some input facts in one of Franks's blog posts a few years back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants