Skip to content

Conversation

BlakeOrth
Copy link
Contributor

Which issue does this PR close?

This does not fully close, but is an incremental building block component for:

The full context of how this code is likely to progress can be seen in the POC for this effort:

Rationale for this change

For queries that have many calls to an instrumented object store generating a full output of all the calls and the summary of those calls could end up generating thousands of lines of output. Allowing users to only see a summary for these cases will help ensure the instrumented object store does not completely dominate the output for a query.

What changes are included in this PR?

  • Adds the ability for a user to choose a summary only output for an instrumented object store when using the CLI
  • The existing "enabled" setting that displays both a summary and a detailed usage for each object store call has been renamed to Trace to improve clarity
  • Adds additional test cases for summary only and modifies existing tests to use trace
  • Updates user guide docs to reflect the CLI flag and command line changes

Are these changes tested?

Yes. Additional unit tests have been added, and the existing integration test has been augmented to exercise the new option(s).

Example functional output:

./datafusion-cli --object-store-profiling trace
DataFusion CLI v50.2.0
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.532 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Trace, inner: HttpStore
2025-10-14T22:26:13.185625701+00:00 operation=Get duration=0.035335s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-14T22:26:13.221015783+00:00 operation=Get duration=0.045423s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet

Summaries:
Get
count: 2
duration min: 0.035335s
duration max: 0.045423s
duration avg: 0.040379s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B

> \object_store_profiling summary
ObjectStore Profile mode set to Summary
> CREATE EXTERNAL TABLE hits2
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_2.parquet';
0 row(s) fetched.
Elapsed 0.179 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: HttpStore
Summaries:
Get
count: 2
duration min: 0.021558s
duration max: 0.022129s
duration avg: 0.021843s
size min: 8 B
size max: 55508 B
size avg: 27758 B
size sum: 55516 B

>

Are there any user-facing changes?

Yes. An existing user option in the form of a CLI flag and the associated command was changed. The user documentation has been updated to reflect these changes.

cc @alamb
(I believe the previous PR that was merged for this effort was the last major set of core functionality! 🎉 The remaining PRs should all be pretty concise and just fill out the small bits of missing implementation.)

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 14, 2025
@BlakeOrth BlakeOrth force-pushed the feature/cli_instrument_trace branch from 46460fa to 0ba562f Compare October 15, 2025 16:49
 - Adds the ability for a user to choose a summary only output for an
   instrumented object store when using the CLI
 - The existing "enabled" setting that displays both a summary and a
   detailed usage for each object store call has been renamed to `Trace`
   to improve clarity
 - Adds additional test cases for summary only and modifies existing
   tests to use trace
 - Updates user guide docs to reflect the CLI flag and command line
   changes
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BlakeOrth -- I tried it out, and it works great!

ObjectStore Profile mode set to Summary
> select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
+----------+
| count(*) |
+----------+
| 1000000  |
+----------+
1 row(s) fetched.
Elapsed 0.595 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: HttpStore
Summaries:
Get
count: 2
duration min: 0.053315s
duration max: 0.056176s
duration avg: 0.054746s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B

> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
+----------+
| count(*) |
+----------+
| 1000000  |
+----------+
1 row(s) fetched.
Elapsed 0.199 seconds.

Object Store Profiling

@alamb alamb added this pull request to the merge queue Oct 16, 2025
Merged via the queue into apache:main with commit 3bca1bb Oct 16, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants