Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make use of parallelization with cstore_fdw? #232

Open
njanakiev opened this issue Jun 26, 2020 · 2 comments
Open

How to make use of parallelization with cstore_fdw? #232

njanakiev opened this issue Jun 26, 2020 · 2 comments

Comments

@njanakiev
Copy link

psql (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)
cstore_fdw 1.7.0

Thank you for this project! While using the example in PostgreSQL Columnar Store for Analytic Workloads i wanted to compare the query to using cstore_fdw and without and I noticed that the query was not utilizing parallelization for the following query:

EXPLAIN SELECT
    width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
    round(avg(review_rating), 2) AS review_average,
    count(*)
FROM
    customer_reviews_cstore
WHERE
    product_group = 'Book'
GROUP BY
    title_length_bucket
ORDER BY
    title_length_bucket;

Here is the output:

Sort  (cost=25379.78..25379.78 rows=1 width=44)"
  Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
  ->  HashAggregate  (cost=25379.75..25379.77 rows=1 width=44)"
        Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)"
        ->  Foreign Scan on customer_reviews_cstore  (cost=0.00..25313.65 rows=8813 width=8)"
              Filter: (product_group = 'Book'::text)"
              CStore File: /var/lib/postgresql/12/main/cstore_fdw/16387/74837"
              CStore File Size: 105383392"

For the same table without cstore_fdw I get the following output:

Finalize GroupAggregate  (cost=72554.51..79387.78 rows=25229 width=44)"
  Group Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
  ->  Gather Merge  (cost=72554.51..78441.69 rows=50458 width=44)"
        Workers Planned: 2"
        ->  Sort  (cost=71554.49..71617.56 rows=25229 width=44)"
              Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
              ->  Partial HashAggregate  (cost=69268.39..69709.90 rows=25229 width=44)"
                    Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)"
                    ->  Parallel Seq Scan on customer_reviews  (cost=0.00..65552.06 rows=495510 width=8)"
                          Filter: (product_group = 'Book'::text)"

Is there some setting that I have missed? I saw that foreign scans are marked as parallel safe in #198, is this supported in PostgreSQL 12?

@njanakiev
Copy link
Author

Data used is from http://examples.citusdata.com/customer_reviews_1998.csv.gz and http://examples.citusdata.com/customer_reviews_1999.csv.gz. Creation of cstore table:

CREATE EXTENSION  cstore_fdw;
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;

CREATE FOREIGN TABLE customer_reviews_cstore ( 
    customer_id TEXT, 
    review_date DATE, 
    review_rating INTEGER, 
    review_votes INTEGER, 
    review_helpful_votes INTEGER, 
    product_id CHAR(10), 
    product_title TEXT, 
    product_sales_rank BIGINT, 
    product_group TEXT, 
    product_category TEXT, 
    product_subcategory TEXT, 
    similar_product_ids CHAR(10)[] )
SERVER cstore_server OPTIONS(compression 'pglz');

COPY customer_reviews_cstore FROM '/home/user/customer_reviews_1998.csv' WITH CSV;
COPY customer_reviews_cstore FROM '/home/user/customer_reviews_1999.csv' WITH CSV;

Creation of table without cstore:

CREATE TABLE customer_reviews ( 
    customer_id TEXT, 
    review_date DATE, 
    review_rating INTEGER, 
    review_votes INTEGER, 
    review_helpful_votes INTEGER, 
    product_id CHAR(10), 
    product_title TEXT, 
    product_sales_rank BIGINT, 
    product_group TEXT, 
    product_category TEXT, 
    product_subcategory TEXT, 
    similar_product_ids CHAR(10)[]
);
    
COPY customer_reviews FROM '/home/user/customer_reviews_1999.csv' WITH CSV;
COPY customer_reviews  FROM '/home/user/customer_reviews_1999.csv' WITH CSV;

@sdressler
Copy link

AFAICT, you will have to make use of partitions to see parallelism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants