support for joins? #46

sarukas · 2015-03-09T19:10:51Z

Sorry for asking this here. Does splout DB support joins? Intended use case is joining large batch-generated table with a small dimension table on the fly.

pereferrera · 2015-03-09T19:19:28Z

Hello sarukas,
Yes, you can do that by indexing the dimension table as "replicate to all". In this way the dimension table will be written in every partition of the tablespace. Check the user guide, sections "Partitioning" for the conceptual part and "Splout-Hadoop API" for the hands-on part.

sarukas · 2015-03-09T20:05:19Z

Hi,

Thanks for a quick reply. One more question: how are totals handled? Are they possible across partitions? E.g. count grouping by country where country is the partition?

Our use case is for olap queries, where the lowest level of aggregation is done on several dimension paths, but higher levels would be calculated on the fly.

Thanks, Sarunas
Sent from my iPhone

On Mar 9, 2015, at 21:19, Pere Ferrera [email protected] wrote:

Hello sarukas,
Yes, you can do that by indexing the dimension table as "replicate to all". In this way the dimension table will be written in every partition of the tablespace. Check the user guide, sections "Partitioning" for the conceptual part and "Splout-Hadoop API" for the hands-on part.

—
Reply to this email directly or view it on GitHub.

pereferrera · 2015-03-10T10:01:32Z

Hello sarukas,
If you partition per country then you can essentially make SQL queries only over one countrie's data... That's the main restriction of Splout SQL. When you want to do cross-partition queries you can always make the same query to all partitions and join the results manually. A better solution would be to integrate Splout with a higher-level querying system like Apache Drill. We have done that internally, but we still need to test it further. We didn't release the integration with Drill, but tell us in case you are interested.

thbeh · 2015-04-15T04:45:42Z

Could you provide some details on how SploutSQL integrate with Apache Drill?

pereferrera · 2015-04-15T07:52:07Z

We wrote a plugin for Drill to integrate Splout as another data store that Drill can query. Because Splout is partitioned and indexed, we tell Drill what partition/s to scan and how to execute the query so that it will use the appropriate indexes. If the SQL query has an equality condition on the partition key, then Drill does the same that you would do with the normal Splout SQL API: querying a single partition. Otherwise, as many scans as needed are produced, and Drill takes care of all the rest (grouping / aggregating / etc). Although we didn't test the performance of this system fully, we expect it to behave quite fast for queries that don't impact massive portions of the data (a full-scan of the data would be much more efficient with another underlying store like just Parquet files).

Would you be interested in trying this for your use case?

thbeh · 2015-04-15T08:04:35Z

I would be interested as I am trying to look at SploutSQL without all the complexity of SparkSQL. The main advantage of SploutSQL here is having REST api. Could you share more info as I am still lacking on Drill's concept.

pereferrera · 2015-04-15T09:02:04Z

Hi,
In this case I think the first step would be to try Splout SQL for your particular use case. Splout solves a particular problem (web-latency SQL from Hadoop data) and might not be the best suit for other problems (arbitrary, full-scan queries over huge datasets).

If you incorporate Splout SQL for your use case and are already happy with it, but need to be able to support cross-partition queries, then you would move to Drill over Splout.

I think it would be better to follow up on your use case on the user list, feel free to write about your use case there and we can help you setup Splout for trying it: https://groups.google.com/forum/?fromgroups#!forum/sploutdb-users

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for joins? #46

support for joins? #46

sarukas commented Mar 9, 2015

pereferrera commented Mar 9, 2015

sarukas commented Mar 9, 2015

pereferrera commented Mar 10, 2015

thbeh commented Apr 15, 2015

pereferrera commented Apr 15, 2015

thbeh commented Apr 15, 2015

pereferrera commented Apr 15, 2015

support for joins? #46

support for joins? #46

Comments

sarukas commented Mar 9, 2015

pereferrera commented Mar 9, 2015

sarukas commented Mar 9, 2015

pereferrera commented Mar 10, 2015

thbeh commented Apr 15, 2015

pereferrera commented Apr 15, 2015

thbeh commented Apr 15, 2015

pereferrera commented Apr 15, 2015