feat(aggregator): support profile/system counter aggregation and custom metric Arrow output#63
Merged
rayandrew merged 1 commit intollnl:developfrom Apr 4, 2026
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the DFTracer aggregation pipeline to treat counter events (ph="C") as first-class aggregated outputs (separating profile vs system counters), adds dynamic custom-metric aggregation from args, and updates Arrow/Python surfaces and TraceReader byte-range behavior accordingly.
Changes:
- Add profile/system counter aggregation paths and merge behavior across chunk/event aggregators.
- Extend Arrow output schema with
batch_type,ts,te, and dynamic custom metric columns, and update Python bindings/tests/docs. - Add plain-file byte-range line reading (sync/async parity, line completion past
end_byte) and test coverage.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/utilities/reader/test_trace_reader.cpp | Adds plain-file byte-range read_lines tests (line completion, skipping partial first line, query filter). |
| tests/utilities/fileio/lines/sources/test_async_plain_file_bytes_generator.cpp | Updates sync reference implementation and adds async/sync parity tests (empty lines, line completion past end). |
| tests/utilities/composites/dft/aggregators/test_event_aggregator_utility.cpp | New C++ test for merging event/profile/system aggregation maps. |
| tests/utilities/composites/dft/aggregators/test_chunk_aggregator_utility.cpp | New C++ tests for chunk aggregation separation and query/byte-range semantics for counters. |
| tests/utilities/composites/dft/aggregators/test_aggregator_utility.cpp | New end-to-end C++ test ensuring event/profile/system batches and custom metrics are emitted. |
| tests/utilities/CMakeLists.txt | Adds the new aggregator utility test sources to the utilities test target. |
| tests/python/test_arrow_ipc.py | Updates Arrow IPC tests to assert the new base schema columns. |
| tests/python/test_aggregator.py | Adds Python-level tests for profile/system/custom-metric aggregation and batch_type separation. |
| tests/binaries/test_dftracer_server.cpp | Skips server integration tests when TCP sockets are unavailable. |
| src/dftracer/utils/utilities/reader/trace_reader.cpp | Adds plain-file byte-range path using async_plain_file_bytes with query filtering. |
| src/dftracer/utils/utilities/composites/dft/aggregators/event_aggregator_utility.cpp | Extends merge to include profile and system aggregation maps. |
| src/dftracer/utils/utilities/composites/dft/aggregators/chunk_aggregator_utility.cpp | Switches chunk reading to TraceReader, adds counter handling and custom-metric aggregation for counters, splits outputs into event/profile/system maps. |
| src/dftracer/utils/utilities/composites/dft/aggregators/aggregator_utility.cpp | Adds batch_type and dynamic schema generation for extra keys/custom metrics, emits separate batches for event/profile/system, and avoids index building for uncompressed inputs. |
| src/dftracer/utils/utilities/composites/dft/aggregators/aggregation_metrics.cpp | Uses CustomMetricsMap alias for custom metric storage. |
| src/dftracer/utils/python/utilities/aggregator.cpp | Exposes custom_metric_fields and compute_percentiles to Python, adds shared list parsing helper. |
| src/dftracer/utils/python/trace_reader.cpp | Adds Arrow normalization support, string arena, and new iter_arrow params (flatten_objects, normalize). |
| python/dftracer/utils/utilities/_aggregator.pyi | Updates Python stub signatures for new aggregator parameters. |
| python/dftracer/utils/dftracer_utils_ext.pyi | Updates extension stub signatures for new aggregator parameters. |
| include/dftracer/utils/utilities/fileio/lines/sources/async_plain_file_bytes_generator.h | Adjusts byte-range generator to preserve empty lines and complete in-flight line past end_byte. |
| include/dftracer/utils/utilities/composites/dft/event.h | Adds convenience predicates for counter/profile/system/event classification. |
| include/dftracer/utils/utilities/composites/dft/aggregators/aggregator_utility.h | Introduces AggregationBatchType and stores it in AggregationBatch. |
| include/dftracer/utils/utilities/composites/dft/aggregators/aggregation_output.h | Extends aggregation outputs with profile/system maps. |
| include/dftracer/utils/utilities/composites/dft/aggregators/aggregation_metrics.h | Introduces CustomMetricsMap with transparent hashing and updates AggregationMetrics to use it. |
| docs/source/quickstart.rst | Documents custom metric aggregation + percentile option in Python quickstart. |
| docs/source/index.rst | Adds a short example using custom_metric_fields in the index page. |
| docs/source/cli.rst | Documents the three logical row types and Arrow base/custom-metric columns; updates CLI examples. |
| docs/source/api/utilities.rst | Documents updated Arrow schema and new Python API options. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/dftracer/utils/utilities/composites/dft/aggregators/aggregator_utility.cpp
Outdated
Show resolved
Hide resolved
src/dftracer/utils/utilities/composites/dft/aggregators/aggregator_utility.cpp
Outdated
Show resolved
Hide resolved
src/dftracer/utils/utilities/composites/dft/aggregators/chunk_aggregator_utility.cpp
Show resolved
Hide resolved
src/dftracer/utils/utilities/composites/dft/aggregators/chunk_aggregator_utility.cpp
Show resolved
Hide resolved
a8f78da to
39b9b58
Compare
…n with custom metrics
hariharan-devarajan
approved these changes
Apr 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR extends the aggregation pipeline to handle DFTracer counter events
(
ph="C") as first-class aggregated outputs, including:argsTraceReaderAggregation behavior
profile_aggregationsandsystem_aggregationsalongsideregular event aggregations.
ph="C"withcat != "sys"ph="C"withcat == "sys"argsviacustom_metric_fields.utility collection.
Arrow / Python API
batch_typetste<field>_total|min|max|mean|stdcolumns for custom metricscustom_metric_fieldscompute_percentilesTraceReader / byte-range handling
TraceReader.end_byte.Tests
AggregatorUtility::process()TraceReaderbyte-range behaviorAggregatorUtilityiter_arrow()batch separation bybatch_typeTesting
C++
Ran from
build/build-tests:Result:
134/134tests passedPython
Targeted Python verification:
Result:
14 passedNotes
batch_type,ts,te, and optional custom metric columns.ph="C"rows are intentionally split into profile vs system outputs based oncat == "sys".