-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet File Metadata caching implementation #541
base: project-antalya
Are you sure you want to change the base?
Changes from 9 commits
d63fd14
415b351
b928250
f8a2ad9
3a992b0
a78f188
9add7d8
465e96e
861bdf5
5a7a8ad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
10 | ||
10 | ||
10 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
-- Tags: no-parallel, no-fasttest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you also add a few tests for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would have to be an integration test, maybe with 10s or 100's of parquet files. I can add it in another PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If local files also benefited from metadata cache, an integration test wouldn't be needed I suppose. But doesn't look like we want to do it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For local Parquet files, OS file cache will be in effect. |
||
|
||
DROP TABLE IF EXISTS t_parquet_03262; | ||
|
||
CREATE TABLE t_parquet_03262 (a UInt64) | ||
ENGINE = S3(s3_conn, filename = 'test_03262_{_partition_id}', format = Parquet) | ||
PARTITION BY a; | ||
|
||
INSERT INTO t_parquet_03262 SELECT number FROM numbers(10) SETTINGS s3_truncate_on_insert=1; | ||
|
||
SELECT COUNT(*) | ||
FROM s3(s3_conn, filename = 'test_03262_*', format = Parquet) | ||
SETTINGS input_format_parquet_use_metadata_cache=1; | ||
|
||
SELECT COUNT(*) | ||
FROM s3(s3_conn, filename = 'test_03262_*', format = Parquet) | ||
SETTINGS input_format_parquet_use_metadata_cache=1, log_comment='test_03262_parquet_metadata_cache'; | ||
|
||
SYSTEM FLUSH LOGS; | ||
|
||
SELECT ProfileEvents['ParquetMetaDataCacheHits'] | ||
FROM system.query_log | ||
where log_comment = 'test_03262_parquet_metadata_cache' | ||
AND type = 'QueryFinish' | ||
ORDER BY event_time desc | ||
LIMIT 1; | ||
|
||
DROP TABLE t_parquet_03262; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned it into a server settings, makes more sense as it can't be changed at runtime