Skip to content

#7: Fix iteration breakdown, add frequency analysis, and plot dropped nodes#9

Merged
lifflander merged 16 commits intomasterfrom
7-fix-iteration-breakdown-and-analysis
Jun 5, 2025
Merged

#7: Fix iteration breakdown, add frequency analysis, and plot dropped nodes#9
lifflander merged 16 commits intomasterfrom
7-fix-iteration-breakdown-and-analysis

Conversation

@cwschilly
Copy link
Copy Markdown
Contributor

@cwschilly cwschilly commented Mar 26, 2025

Fixes #7
Fixes #8

Also refactors the Python code to be more modular and adds both function timers and a new plot for dropped nodes.

The plot the dropped nodes for previous runs, run from the project dir:

python detection/plot_dropped_nodes.py -s slownode.log -a slownodenanalysis.log -o /path/to/output_dir

Where:

-s <slownode.log>         - The driver output (from slow_node.cc)
-a <slownodeanalysis.log> - The analysis output (from detect_slow_nodes.py)

@cwschilly cwschilly linked an issue Mar 26, 2025 that may be closed by this pull request
@cwschilly
Copy link
Copy Markdown
Contributor Author

@nlslatt @lifflander Should we ignore the first iteration during analysis?

@nlslatt
Copy link
Copy Markdown

nlslatt commented Mar 28, 2025

@nlslatt @lifflander Should we ignore the first iteration during analysis?

We should probably do one more than the requested number of iterations and completely throw out the first one. I think the first iteration should not be printed or added to the total time.

@nlslatt
Copy link
Copy Markdown

nlslatt commented Mar 28, 2025

@cwschilly Are you planning to fix the sensors node name problem as a separate PR or part of this? There are a lot of things that need to be addressed before I can start collecting new data and I'd really like to start collecting it now.

@nlslatt
Copy link
Copy Markdown

nlslatt commented Mar 28, 2025

It would be helpful if the list of slowest iterations included the iteration number and not just the time.

@cwschilly
Copy link
Copy Markdown
Contributor Author

@cwschilly Are you planning to fix the sensors node name problem as a separate PR or part of this? There are a lot of things that need to be addressed before I can start collecting new data and I'd really like to start collecting it now.

@nlslatt I can put everything in this PR. Going to try to get everything fixed up by the end of the day--I'll ping you when it's ready

@nlslatt
Copy link
Copy Markdown

nlslatt commented Mar 28, 2025

@cwschilly Jonathan had asked me for the barrier after the random initialization (i.e., before the for loop within runBenchmark), which is where I've been using one

@cwschilly cwschilly requested a review from nlslatt March 31, 2025 15:29
@nlslatt
Copy link
Copy Markdown

nlslatt commented Mar 31, 2025

@cwschilly The node names in the sensors files are still incorrect. All nodes have been given the name of the first node in the allocation.

@cwschilly cwschilly changed the title #7: Fix iteration breakdown and analysis #7: Fix iteration breakdown, add frequency analysis, and plot dropped nodes Apr 9, 2025
@cwschilly cwschilly requested a review from nlslatt April 9, 2025 17:11
@cwschilly cwschilly self-assigned this Apr 15, 2025
@cwschilly cwschilly requested a review from nlslatt April 15, 2025 18:51
@nlslatt
Copy link
Copy Markdown

nlslatt commented Jun 3, 2025

I think the node names are working correctly now.

@lifflander lifflander merged commit 9b20b30 into master Jun 5, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read/write CPU freq and sensors before and after benchmark Fix iteration breakdown and analysis

3 participants