Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
b951c09
Add support for extra columns in trace reading functions
izzet Jul 10, 2025
34aff5e
Add extra_columns and extra_columns_fn parameters to analyze_trace me…
izzet Jul 10, 2025
37cdce6
Fix typo
izzet Jul 11, 2025
0399cda
Merge branch 'main' into feat/support-extra-columns
hariharan-devarajan Jul 12, 2025
18c4359
Update utilization calculations and add DLIO AI logging configuration
izzet Jul 24, 2025
443540b
Fix: replace np.isnan with pd.isna for better compatibility in set_me…
izzet Jul 24, 2025
8987acc
Add streaming optional dependency for enhanced functionality
izzet Jul 24, 2025
2f89451
feat: Introduce input and output configurations for file and ZMQ
izzet Aug 12, 2025
4e35698
feat: Add streamz-zmq to streaming optional dependencies for enhanced…
izzet Aug 12, 2025
b80bbd5
feat: Add ZMQ analysis tests for streaming functionality
izzet Aug 12, 2025
9d01326
Merge branch 'main' into feat/streaming
izzet Aug 31, 2025
3edaed0
Merge branch 'main' into feat/streaming
izzet Sep 2, 2025
adeac9c
Enhance streaming functionality and testing
izzet Sep 3, 2025
4d547ff
Refactor set_main_metrics to use pd.NA for better handling of missing…
izzet Sep 3, 2025
750f6c0
Fix split_duration_records_vectorized to handle NaN values in duratio…
izzet Sep 3, 2025
f35731b
Refactor test data extraction logic and ensure extracted files are av…
izzet Sep 3, 2025
5f4c1a8
Update dftracer-posix test data
izzet Sep 4, 2025
4a9c26c
Add structlog for improved logging and replace logging with structlog…
izzet Sep 4, 2025
250add4
Refactor tests for DFTracerAnalyzer and metrics
izzet Sep 6, 2025
9880ce8
Enhance metrics calculations in tests to include fractional totals fo…
izzet Sep 6, 2025
948a48b
Fix aggregation method in Analyzer class to ensure bin columns use st…
izzet Sep 6, 2025
9d92cb4
Add quantile metrics processing and related tests in Analyzer and met…
izzet Sep 6, 2025
9111192
Remove debug print statements in DFTracerAnalyzer's _handle_metadata …
izzet Sep 6, 2025
eaf574b
Add handling for empty stats in set_quantile_metrics function
izzet Sep 6, 2025
5ea0a8c
Add tests for quantile statistics handling in Dask aggregation
izzet Sep 6, 2025
bea9e76
Add check for empty duration records in split_duration_records_vector…
izzet Sep 6, 2025
a135915
Refactor DFTracerAnalyzer to persist hashes and metadata unconditiona…
izzet Sep 6, 2025
6ea2807
Add methods to count unique files and hosts in DFTracerAnalyzer
izzet Sep 6, 2025
f145e7c
Refactor set_cross_layer_metrics to improve handling of total time fr…
izzet Sep 6, 2025
f0872bb
Add unique_set and unique_set_flatten tests to validate Dask aggregat…
izzet Sep 6, 2025
9e9d6d9
Add pytest marks for smoke and full testing in test_streaming.py
izzet Sep 6, 2025
f872486
Merge branch 'main' into feat/streaming
izzet Jan 7, 2026
0e07891
fix: Update dftracer AI logging fixture to handle epoch.block events
izzet Jan 9, 2026
7e791c0
feat: Add time_boundary_layer attribute to AnalyzerPresetConfig and i…
izzet Jan 10, 2026
6d00e50
Integrate Mofka, complete streaming tests
izzet Jan 17, 2026
86462fb
fix: Update type checks for DataFrame columns in Analyzer class
izzet Jan 26, 2026
1ea2f7c
fix: Restrict pandas version to <3 for compatibility
izzet Jan 26, 2026
b9fd485
feat: add auto-detection of fabric protocol for bedrock/Mofka
izzet Feb 19, 2026
3dab4bf
Rename ZMQ test files
izzet Feb 20, 2026
48e537e
refactor(streaming): replace streamz with callback-based ZMQ handling
izzet Feb 20, 2026
fb0ac21
feat(analyzer): add analysis facts engine for generating structured p…
izzet Mar 1, 2026
f79d2b9
feat(streaming): multi-rank epoch boundary analysis and fact engine i…
izzet Mar 6, 2026
19e79e8
feat(analyzer): replace epoch-based buffering with window-based strea…
izzet Mar 9, 2026
4c8b1f4
feat(analyzer): step-level control windows and trace drain grace period
izzet Mar 15, 2026
6888a3f
Add TF reader_posix_pressure rules, fix Int16 overflow, fix fillna0 i…
izzet Mar 18, 2026
3361e19
Simplify analyzer for window-based analysis, fix event normalization …
izzet Mar 19, 2026
6afa5c8
feat(analyzer): add POSIX fact rules based on WisIO heuristics
izzet Mar 21, 2026
e8222b2
feat(analyzer): update derived metrics for step view to align with pa…
izzet Mar 21, 2026
91e9902
Merge origin/main into feat/streaming: hybrid profiles, two-track sta…
izzet Mar 21, 2026
794a2b0
feat(analyzer): update _handle_metadata to return additional profile …
izzet Mar 21, 2026
135dc3c
feat(analyzer): update fact rules to use two-track metrics
izzet Mar 21, 2026
e59940b
feat(analyzer): rename WindowBoundaryTracker to WindowTracker for cla…
izzet Mar 29, 2026
809ab22
feat(analyzer): update open_consumer to use progress thread by default
izzet Mar 29, 2026
98af400
feat(analyzer): add support for 64-bit POSIX I/O functions in POSIX_I…
izzet Mar 29, 2026
28da2da
feat(analyzer): derived metrics handle missing layers
izzet Mar 29, 2026
a71e08d
feat(analyzer): rename WindowBoundaryTracker to WindowTracker
izzet Mar 29, 2026
4f3f654
feat(analyzer): refactor event normalization and enhance system metri…
izzet Apr 4, 2026
3de8ea3
fix(analyzer): fix trange integer overflow and silence hot-path debug…
izzet Apr 25, 2026
d8f4166
feat(analyzer): add suppresses_tags, per-node scope, and window_index
izzet Apr 25, 2026
1109977
feat(analyzer): overhaul DLIO rules and add StorMer/variant rule configs
izzet Apr 25, 2026
aeefcd7
feat(analyzer): add global HLM analyzer and StorMer preset
izzet Apr 25, 2026
21ec911
test(analyzer): update tests for fetch_data to fetch_iter rename
izzet Apr 25, 2026
9fd1549
build(analyzer): install stormer.yaml rule config
izzet Apr 25, 2026
34514e7
Add global fetch_outlier and reader_open_tail_outlier rules (R10, R11)
izzet Apr 25, 2026
3e9c65b
Rename pread_contention to read_contention and add bw_saturation rule
izzet Apr 25, 2026
0bf963c
Rename reader_open_pressure to open_pressure across all rule files
izzet Apr 25, 2026
0c4f9d9
Rename fact_types to match paper rule catalog
izzet Apr 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:

- name: Install DFAnalyzer
run: |
pip install .[darshan] \
pip install .[darshan,streaming] \
-Csetup-args="--prefix=$HOME/.local" \
-Csetup-args="-Denable_tests=true" \
-Csetup-args="-Denable_tools=true"
Expand Down Expand Up @@ -116,14 +116,14 @@ jobs:

# Run analysis commands using the external cluster
if [[ "${{ steps.test_type.outputs.test_type }}" == "full" ]]; then
dfanalyzer analyzer=darshan trace_path=tests/data/extracted/darshan-posix-dxt \
cluster=external cluster.restart_on_connect=True cluster.scheduler_address=$scheduler_address
dfanalyzer analyzer=recorder trace_path=tests/data/extracted/recorder-posix-parquet \
dfanalyzer analyzer=darshan input.path=tests/data/extracted/darshan-posix-dxt \
cluster=external cluster.restart_on_connect=True cluster.scheduler_address=$scheduler_address
dfanalyzer analyzer=dftracer analyzer/preset=dlio trace_path=tests/data/extracted/dftracer-dlio \
dfanalyzer analyzer=recorder input.path=tests/data/extracted/recorder-posix-parquet \
cluster=external cluster.restart_on_connect=True cluster.scheduler_address=$scheduler_address
dfanalyzer analyzer=dftracer analyzer/preset=dlio input.path=tests/data/extracted/dftracer-dlio \
cluster=external cluster.restart_on_connect=True cluster.scheduler_address=$scheduler_address
else
dfanalyzer analyzer=dftracer analyzer/preset=dlio trace_path=tests/data/extracted/dftracer-dlio \
dfanalyzer analyzer=dftracer analyzer/preset=dlio input.path=tests/data/extracted/dftracer-dlio \
cluster=external cluster.restart_on_connect=False cluster.scheduler_address=$scheduler_address
fi

Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -198,3 +198,9 @@ cython_debug/
outputs/
tmp/
VERSION.txt

# Spack
/spack.lock
/._view
/view
/.spack-env
89 changes: 47 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,53 +62,58 @@ This command analyzes the traces and prints a high-level summary of the applicat

```bash
Time Period Summary
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Metric ┃ Unit ┃ Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ Job Time │ seconds │ 56.695 │
│ Total Count │ count │ 15,901 │
│ Total Files │ count │ 87 │
│ Total Nodes │ count │ 0 │
│ Total Processes │ count │ 23 │
│ App Count │ count │ 8 │
│ Training Count │ count │ 40 │
│ Compute Count │ count │ 200 │
│ Fetch Data Count │ count │ 160 │
│ Data Loader Count │ count │ 808 │
│ Data Loader Fork Count │ count │ 96 │
│ Reader Count │ count │ 4,008 │
│ Reader POSIX (Lustre) Count │ count │ 10,432 │
│ Reader POSIX (Lustre) Size │ MB │ 111833.161 │
│ Reader POSIX (Lustre) Bandwidth │ MB/s │ 874.982 │
│ Reader POSIX (Lustre) Avg Transfer Size │ MB │ 10.720 │
│ Checkpoint Count │ count │ 8 │
│ Checkpoint POSIX (Lustre) Count │ count │ 45 │
│ Checkpoint POSIX (Lustre) Size │ MB │ 0.011 │
│ Checkpoint POSIX (Lustre) Bandwidth │ MB/s │ 0.791 │
│ Checkpoint POSIX (Lustre) Avg Transfer Size │ MB │ 0.000 │
│ Other POSIX Count │ count │ 96 │
└───────────────────────────────────────────────────────────────────────────────┴────────────────┴────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric ┃ Unit ┃ Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ Job Time │ seconds │ 56.695 │
│ Total Count │ count │ 18,039 │
│ Total Files │ count │ 166 │
│ Total Nodes │ count │ 1 │
│ Total Processes │ count │ 8 │
│ App Count │ count │ 8 │
│ Training Count │ count │ 8 │
│ Epoch Count │ count │ 40 │
│ Compute Count │ count │ 200 │
│ Fetch Data Count │ count │ 160 │
│ Checkpoint Count │ count │ 8 │
│ Data Loader Count │ count │ 816 │
│ Data Loader Fork Count │ count │ 96 │
│ Reader Count │ count │ 3,200 │
│ POSIX - All Count │ count │ 10,581 │
│ POSIX - All Size │ MB │ 111833.172 │
│ POSIX - All Bandwidth │ MB/s │ 6048.367 │
│ POSIX - All Avg Transfer Size │ MB │ 10.569 │
│ POSIX - Reader Count │ count │ 10,432 │
│ POSIX - Reader Size │ MB │ 111833.161 │
│ POSIX - Reader Bandwidth │ MB/s │ 6095.909 │
│ POSIX - Reader Avg Transfer Size │ MB │ 10.720 │
│ POSIX - Checkpoint Count │ count │ 45 │
│ POSIX - Checkpoint Size │ MB │ 0.011 │
│ POSIX - Checkpoint Bandwidth │ MB/s │ 2.525 │
│ POSIX - Checkpoint Avg Transfer Size │ MB │ 0.000 │
└───────────────────────────────────────────────────────────────────────────┴──────────────────┴───────────────────────┘
```

DFAnalyzer also provides a detailed breakdown of performance metrics for each layer of the application. Here is a snippet of the "Layer Breakdown" section from the same run, which includes the percentage of time each layer overlaps with its parent layer:

```bash
Layer Breakdown (w/ overlap %)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Layer ┃ Time (s) ┃ Ops ┃ Ops/sec ┃ Size (MB) ┃ Bandwidth (MB/s) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ App │ 441.967 (----) │ 8 (----) │ 0.018 │ - │ - │
│ Training │ 439.442 (----) │ 40 (----) │ 0.091 │ - │ - │
│ Compute │ 272.356 (----) │ 200 (----) │ 0.734 │ - │ - │
│ Fetch Data │ 126.179 ( 16%) │ 160 ( 25%) │ 1.268 │ - │ - │
│ Data Loader │ 151.471 ( 45%) │ 808 ( 46%) │ 5.334 │ - │ - │
│ Data Loader Fork │ 2.392 ( 0%) │ 96 ( 0%) │ 40.135 │ - │ - │
│ Reader │ 299.992 ( 40%) │ 4,008 ( 51%) │ 13.360 │ - │ - │
│ Reader POSIX (Lustre) │ 127.812 ( 45%) │ 10,432 ( 48%) │ 81.620 │ 111833.161 ( 46%) │ 874.982 │
│ Checkpoint │ 0.014 ( 0%) │ 8 ( 0%) │ 571.551 │ - │ - │
│ Checkpoint POSIX (Lustre) │ 0.014 ( 0%) │ 45 ( 0%) │ 3268.686 │ 0.011 ( 0%) │ 0.791 │
│ Other POSIX │ 2.392 ( 0%) │ 96 ( 0%) │ 40.135 │ 0.000 (----) │ - │
└─────────────────────────────┴──────────────────┴────────────────┴───────────┴────────────────────┴──────────────────┘
Layer Breakdown (w/ overlap %)
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer ┃ Time (s) ┃ Ops ┃ Ops/sec ┃ Size (MB) ┃ Bandwidth (MB/s) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ App │ 55.246 (----) │ 8 (----) │ 0.145 │ - │ - │
│ Training │ 55.246 (----) │ 8 (----) │ 0.145 │ - │ - │
│ Epoch │ 54.937 (----) │ 40 (----) │ 0.728 │ - │ - │
│ Compute │ 40.854 (----) │ 200 (----) │ 4.895 │ - │ - │
│ Fetch Data │ 16.889 (----) │ 160 (----) │ 9.474 │ - │ - │
│ Checkpoint │ 0.005 (----) │ 8 (----) │ 1762.503 │ - │ - │
│ Data Loader │ 21.871 ( 54%) │ 816 ( 57%) │ 37.310 │ - │ - │
│ Data Loader Fork │ 0.181 ( 0%) │ 96 ( 0%) │ 530.903 │ - │ - │
│ Reader │ 21.480 ( 55%) │ 3,200 ( 67%) │ 148.979 │ - │ - │
│ POSIX - All │ 18.490 ( 54%) │ 10,581 ( 59%) │ 572.261 │ 111833.172 ( 59%) │ 6048.367 │
│ POSIX - Reader │ 18.346 ( 55%) │ 10,432 ( 60%) │ 568.637 │ 111833.161 ( 59%) │ 6095.909 │
│ POSIX - Checkpoint │ 0.004 (----) │ 45 (----) │ 10433.573 │ 0.011 (----) │ 2.525 │
└────────────────────────┴──────────────────┴─────────────────┴─────────────┴──────────────────────┴───────────────────┘
```

## Further Information
Expand Down
19 changes: 19 additions & 0 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,25 @@ Console Output (``output=console``)

Prints the analysis summary directly to the console. This is the **default** output.

JSON Output (``output=json``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Saves a JSON summary report to disk. This output includes raw stats, per-view
summary metrics, per-layer metrics, and additional metrics statistics.

.. list-table::
:widths: 25 15 30 30
:header-rows: 1

* - Parameter
- Type
- Default
- Description
* - ``output.file_path``
- string
- ""
- JSON output file path. If empty, writes to ``<hydra.runtime.output_dir>/dfanalyzer_output.json``.

CSV Output (``output=csv``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Loading