How to make use of parallelization with cstore_fdw? #232

njanakiev · 2020-06-26T11:46:20Z

psql (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)
cstore_fdw 1.7.0

Thank you for this project! While using the example in PostgreSQL Columnar Store for Analytic Workloads i wanted to compare the query to using cstore_fdw and without and I noticed that the query was not utilizing parallelization for the following query:

EXPLAIN SELECT
    width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
    round(avg(review_rating), 2) AS review_average,
    count(*)
FROM
    customer_reviews_cstore
WHERE
    product_group = 'Book'
GROUP BY
    title_length_bucket
ORDER BY
    title_length_bucket;

Here is the output:

Sort  (cost=25379.78..25379.78 rows=1 width=44)"
  Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
  ->  HashAggregate  (cost=25379.75..25379.77 rows=1 width=44)"
        Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)"
        ->  Foreign Scan on customer_reviews_cstore  (cost=0.00..25313.65 rows=8813 width=8)"
              Filter: (product_group = 'Book'::text)"
              CStore File: /var/lib/postgresql/12/main/cstore_fdw/16387/74837"
              CStore File Size: 105383392"

For the same table without cstore_fdw I get the following output:

Finalize GroupAggregate  (cost=72554.51..79387.78 rows=25229 width=44)"
  Group Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
  ->  Gather Merge  (cost=72554.51..78441.69 rows=50458 width=44)"
        Workers Planned: 2"
        ->  Sort  (cost=71554.49..71617.56 rows=25229 width=44)"
              Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
              ->  Partial HashAggregate  (cost=69268.39..69709.90 rows=25229 width=44)"
                    Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)"
                    ->  Parallel Seq Scan on customer_reviews  (cost=0.00..65552.06 rows=495510 width=8)"
                          Filter: (product_group = 'Book'::text)"

Is there some setting that I have missed? I saw that foreign scans are marked as parallel safe in #198, is this supported in PostgreSQL 12?

The text was updated successfully, but these errors were encountered:

njanakiev · 2020-06-26T11:51:23Z

Data used is from http://examples.citusdata.com/customer_reviews_1998.csv.gz and http://examples.citusdata.com/customer_reviews_1999.csv.gz. Creation of cstore table:

CREATE EXTENSION  cstore_fdw;
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;

CREATE FOREIGN TABLE customer_reviews_cstore ( 
    customer_id TEXT, 
    review_date DATE, 
    review_rating INTEGER, 
    review_votes INTEGER, 
    review_helpful_votes INTEGER, 
    product_id CHAR(10), 
    product_title TEXT, 
    product_sales_rank BIGINT, 
    product_group TEXT, 
    product_category TEXT, 
    product_subcategory TEXT, 
    similar_product_ids CHAR(10)[] )
SERVER cstore_server OPTIONS(compression 'pglz');

COPY customer_reviews_cstore FROM '/home/user/customer_reviews_1998.csv' WITH CSV;
COPY customer_reviews_cstore FROM '/home/user/customer_reviews_1999.csv' WITH CSV;

Creation of table without cstore:

CREATE TABLE customer_reviews ( 
    customer_id TEXT, 
    review_date DATE, 
    review_rating INTEGER, 
    review_votes INTEGER, 
    review_helpful_votes INTEGER, 
    product_id CHAR(10), 
    product_title TEXT, 
    product_sales_rank BIGINT, 
    product_group TEXT, 
    product_category TEXT, 
    product_subcategory TEXT, 
    similar_product_ids CHAR(10)[]
);
    
COPY customer_reviews FROM '/home/user/customer_reviews_1999.csv' WITH CSV;
COPY customer_reviews  FROM '/home/user/customer_reviews_1999.csv' WITH CSV;

sdressler · 2020-12-03T08:13:20Z

AFAICT, you will have to make use of partitions to see parallelism.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make use of parallelization with cstore_fdw? #232

How to make use of parallelization with cstore_fdw? #232

njanakiev commented Jun 26, 2020

njanakiev commented Jun 26, 2020

sdressler commented Dec 3, 2020

How to make use of parallelization with cstore_fdw? #232

How to make use of parallelization with cstore_fdw? #232

Comments

njanakiev commented Jun 26, 2020

njanakiev commented Jun 26, 2020

sdressler commented Dec 3, 2020