You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a metrics table which we are using in our application to aggregate on items like impressions, clicks by day, month etc.
The table DDL is as below
CREATE TABLE public.metrics (
region varchar NOT NULL,
account_id bigserial NOT NULL,
report_id varchar NOT NULL,
report_date date NOT NULL,
entity_id int8 NOT NULL,
impressions int8 NULL,
clicks int8 NULL,
"sales14d" numeric(12,2) NULL,
"conversions14d" int8 NULL,
"cost" numeric(12,2) NULL
)
This table has grown really large and has become tough to manage. As part of an optimization exercise we want to try partitioning and use cstore_fdw to help with faster calculation of aggregated metrics. For partitioning, I am using using RANGE partition by report_date column. I am creating the foreign tables as below
CREATE EXTENSION cstore_fdw;
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
CREATE FOREIGN TABLE tbl_metrics_fdw_2019_q1 PARTITION OF metrics FOR VALUES FROM ('2019-01-01') TO ('2019-04-01') SERVER cstore_server OPTIONS(compression 'pglz');
--Other partitions for other quarters are done in similar way
For data in recent months, the size by quarter was too big ( ~15 - 17 GB), hence I decided to partition data by month. After the partitioning was complete, I ran a simple select on this partitioned table without any joins for the date range where I had partitioned by month, the cost is coming out to be expensive.
select query
EXPLAIN ANALYSE SELECT date_trunc('day',report_date), sum(cost) FROM tbl_amzn_seller_report_partition_master tasrpm WHERE profile_id = 495164040385661 AND report_date
BETWEEN '2021-01-01' AND '2021-03-30' GROUP BY 1 ORDER BY 1 desc
Below is the explain plan, it took ~4.3s. When I ran the same query to compare it another period where partition was by quarter, the cost was (cost=311568.50..311745.74 rows=1266 width=40) (actual time=2388.600..2388.669 rows=3 loops=1); which is significantly less
explain analyse plan
Finalize GroupAggregate (cost=844043.72..844380.29 rows=1266 width=40) (actual time=4368.766..4368.852 rows=3 loops=1)
Group Key: (date_trunc('month'::text, (tasrpm_2.report_date)::timestamp with time zone))
-> Gather Merge (cost=844043.72..844339.14 rows=2532 width=40) (actual time=4368.760..4368.844 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=843043.70..843046.86 rows=1266 width=40) (actual time=4284.954..4284.956 rows=1 loops=3)
Sort Key: (date_trunc('month'::text, (tasrpm_2.report_date)::timestamp with time zone)) DESC
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Partial HashAggregate (cost=842956.31..842978.46 rows=1266 width=40) (actual time=4284.923..4284.932 rows=1 loops=3)
Group Key: (date_trunc('month'::text, (tasrpm_2.report_date)::timestamp with time zone))
-> Parallel Append (cost=0.00..840944.78 rows=402306 width=11) (actual time=2.466..4150.730 rows=323148 loops=3)
-> Foreign Scan on tbl_metrics_partition_fdw_2021_m3 tasrpm_2 (cost=0.00..461043.79 rows=359574 width=11) (actual time=2.571..4060.444 rows=363021 loops=1)
Filter: ((report_date >= '2021-01-01'::date) AND (report_date <= '2021-03-30'::date) AND (profile_id = '495164040385661'::bigint))
Rows Removed by Filter: 14146979
CStore File: /postgresql_data/data/cstore_fdw/16393/17708
CStore File Size: 475126005
-> Foreign Scan on tbl_metrics_partition_fdw_2021_m1 tasrpm (cost=0.00..437256.57 rows=275958 width=11) (actual time=2.453..4036.797 rows=287911 loops=1)
Filter: ((report_date >= '2021-01-01'::date) AND (report_date <= '2021-03-30'::date) AND (profile_id = '495164040385661'::bigint))
Rows Removed by Filter: 14882089
CStore File: /postgresql_data/data/cstore_fdw/16393/17697
CStore File Size: 453330868
-> Foreign Scan on tbl_metrics_partition_fdw_2021_m2 tasrpm_1 (cost=0.00..401676.68 rows=330005 width=11) (actual time=2.373..4223.470 rows=318513 loops=1)
Filter: ((report_date >= '2021-01-01'::date) AND (report_date <= '2021-03-30'::date) AND (profile_id = '495164040385661'::bigint))
Rows Removed by Filter: 13891487
CStore File: /postgresql_data/data/cstore_fdw/16393/17703
CStore File Size: 414883929
Planning Time: 1.633 ms
Execution Time: 4370.064 ms
I have the below questions based on the above context
Is creation of more partitions the cause of increased cost? Our use case allows looking at this metrics by days, weeks, month(s) even till year. Are there any guidelines around partitioning tables in cstore_fdw, i.e. is it based on size or other metrics?
How many number of partitions is considered bad for performance?
Considering large rows for metrics table are there suggested values for stripe_row_count & block_row_count?
The text was updated successfully, but these errors were encountered:
Hey team,
I have a metrics table which we are using in our application to aggregate on items like impressions, clicks by day, month etc.
The table DDL is as below
This table has grown really large and has become tough to manage. As part of an optimization exercise we want to try partitioning and use cstore_fdw to help with faster calculation of aggregated metrics. For partitioning, I am using using RANGE partition by
report_date
column. I am creating the foreign tables as belowFor data in recent months, the size by quarter was too big ( ~15 - 17 GB), hence I decided to partition data by month. After the partitioning was complete, I ran a simple select on this partitioned table without any joins for the date range where I had partitioned by month, the cost is coming out to be expensive.
select query
Below is the explain plan, it took ~4.3s. When I ran the same query to compare it another period where partition was by quarter, the cost was (cost=311568.50..311745.74 rows=1266 width=40) (actual time=2388.600..2388.669 rows=3 loops=1); which is significantly less
explain analyse plan
I have the below questions based on the above context
stripe_row_count
&block_row_count
?The text was updated successfully, but these errors were encountered: