-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis report block based filtering for profiling #566
base: develop
Are you sure you want to change the base?
Analysis report block based filtering for profiling #566
Conversation
@skyreflectedinmirrors , @gsitaram, please help to review the new profiling option: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are quite a lot of changes, we must test the tool thoroughly before deployment. Submitted some comments from our team discussion earlier for now.
src/utils/parser.py
Outdated
"SOL": {"id": 200, "source": "0200_system-speed-of-light.yaml"}, | ||
"MEMCHART": {"id": 300, "source": "0300_mem_chart.yaml"}, | ||
"WAVEFRONT": {"id": 700, "source": "0700_wavefront-launch.yaml"}, | ||
"INSTMIX": {"id": 1000, "source": "1000_compute-unit-instruction-mix.yaml"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed today, would be better to include FLOPs throughput metrics with this option, and maybe change name to COMPUTE
?
src/utils/parser.py
Outdated
section_config = { | ||
"SOL": {"id": 200, "source": "0200_system-speed-of-light.yaml"}, | ||
"MEMCHART": {"id": 300, "source": "0300_mem_chart.yaml"}, | ||
"WAVEFRONT": {"id": 700, "source": "0700_wavefront-launch.yaml"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename this "LAUNCH_STATS" to better describe what we are going to show in this section?
I feel that we should keep |
Me and @feizheng10 had a discussion about this feature today... Currently the way '-b' or '--block' option works in 'analyze' and 'profile' is different as shown below
The former filters the analysis report based on 'report block' such as 'System Speed of Light', 'Memory Chart', 'Wavefront Launch statistics' etc.. The latter filters the profiling operation based on hardware IP blocks such as TA, TCP, SQ etc... This behavior is inconsistent, and we would like to remove 'hardware IP block' based filtering in 'profile' mode in favor of 'report block' based filtering. The former is less useful for kernel developers/profilers as there is no one to one correspondence between hardware IP block and analysis report blocks. For example, filtering by only TCP (L1 cache) or TCC (L2 cache) will affect 'System Speed of Light', 'Memory Chart', 'Instruction Cache' report blocks. Both methods of filtering will save up on profiling time, so we are not losing up on that here. We are thinking of supporting all 19 yaml files for report blocks using the '-b' option during 'profile' mode (instead of specifying hardware IP block). Users can filter based on multiple report blocks and sub-blocks using block numbers (instead of ambiguous acronyms), for example, 'rocprof-compute profile -b 4, 4.5, 5, 5.6' To get the report block numbers corresponding to report block titles, you can use the '--list-metrics' options during 'analyze' mode. We want to replicate this in 'profile' mode, such that, users can grep for the report block title name and obtain the report block numbers to be used for filtering. For example:
--list-metrics will take an optional argument for GPU GFX architecture since report blocks maybe different per architecture. If no argument is provided, it will be automatically detected using 'rocm-smi' tool. To summarize:
NOTE that this will break backward compatibility in the sense that '-b' option in profile mode will work differently. @gsitaram, @skyreflectedinmirrors, could you please provide your comments on the above implementation suggestions. |
I like the idea of unifying what |
Sure, we will add a warning upon usage of One thing to note, in this PR, I have updated profile mode to dump the profiling filters in the workload folder so that when analyze mode is run it will only show the report blocks that have been filtered during profiling. If you want to see other report blocks you will explicitly have to mention them in the |
I like the idea, but I think this would whole concept will need accompanying docs updates. A few specific comments:
One question there: does 4.5 match 14.5 and 4.5 (e.g.)? I.e., is this an exact match, or a regex search, etc.?
I would like to see what that looks like, but I like the general concept.
Instead of changing the default, I'd suggest you simply expand the list of choices ( rocprofiler-compute/src/argparser.py Line 187 in 649660d
That way you don't break anyone's existing workflow, while also ensuring anyone using this option will see the warning. |
It is going to be an exact match not regex. Default is filter for all report blocks and IP blocks. If you want to filter for all report blocks but one, you need to specify all but one on cmdline. I think adding regex will be confusing for developer and user even though it is more flexible. Analyze mode also does exact match.
It would look like this like I mentioned above :)
I like the idea of phased deprecation. In first phase (ROCm 6.5)
Thanks for your feedback, I will add checklist item to update rocprof-compute public docs and also update changelog |
* Add --section option to profile stage for report sections based filtering * Only counters associated with provided report sections will be collected * Add section based filtering to SOC base class * Add parsing logic to identify hardware counters from report configuration files * Add filtering logic to write only filtered counters in perfmon files
* Log not collected counters in one line * Write arguments provided during profiling in output workload folder * Only show sections of report during analysis which provided in section filtering during profiling * Do not show sections of the report during analysis which have empty columns
* Instruction mix section filter * Instruction mix and Memory chart section filters * Instruction mix section filter and CPC IP block filter * Instruction mix section filter and global_write kernel filter * TA IP block filter
* Fix formatting issues
6e71fa9
to
9818cb7
Compare
* -b now take both hardware IP blocks as well as metric ids * Add --list-metrics option to profile mode with architecture auto detect * Update counter detection method to look at text of yaml config file or subsections of yaml config file
TODO
-b
option to filter by report blocks and IP blocks, upon IP block detection emit deprecation warning-b
filter overrides the corresponding filters from dumped profiling configImplement selective counter collection
Write profiling configuration
Report sections-based profiling filters test cases