Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis report block based filtering for profiling #566

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

vedithal-amd
Copy link
Contributor

@vedithal-amd vedithal-amd commented Feb 14, 2025

TODO

  • Update the functionality of -b option to filter by report blocks and IP blocks, upon IP block detection emit deprecation warning
  • Implement --list-metrics for profile mode
  • In analyze mode make sure -b filter overrides the corresponding filters from dumped profiling config
  • Update public facing documentation
  • Update changelog

Implement selective counter collection

* Add --section option to profile stage for report sections-based filtering
* Only counters associated with provided report sections will be collected
* Add section-based filtering to SOC base class
* Add parsing logic to identify hardware counters from report configuration files
* Add filtering logic to write only filtered counters in perfmon files

Write profiling configuration

* Log not collected counters in one line
* Write arguments provided during profiling in output workload folder
* Only show sections of report during analysis which provided in section filtering during profiling
* Do not show sections of the report during analysis which have empty columns

Report sections-based profiling filters test cases

* Instruction mix section filter
* Instruction mix and Memory chart section filters
* Instruction mix section filter and CPC IP block filter
* Instruction mix section filter and global_write kernel filter
* TA IP block filter

@vedithal-amd vedithal-amd self-assigned this Feb 14, 2025
@vedithal-amd vedithal-amd changed the title Selective counter collection Analysis report-based filters for profiling Feb 14, 2025
@vedithal-amd vedithal-amd changed the title Analysis report-based filters for profiling Analysis section-based filters for profiling Feb 14, 2025
@vedithal-amd vedithal-amd changed the title Analysis section-based filters for profiling Analysis section based filters for profiling Feb 14, 2025
@feizheng10
Copy link
Contributor

@skyreflectedinmirrors , @gsitaram, please help to review the new profiling option:
--section ["SOL", "MEMCHART", "WAVEFRONT", "INSTMIX"]
This was a ask to match NV having. You know we have similar option -b, which is more HW oriented.
My question would be: should we keep --section, -b separate or not?

Copy link

@gsitaram gsitaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are quite a lot of changes, we must test the tool thoroughly before deployment. Submitted some comments from our team discussion earlier for now.

"SOL": {"id": 200, "source": "0200_system-speed-of-light.yaml"},
"MEMCHART": {"id": 300, "source": "0300_mem_chart.yaml"},
"WAVEFRONT": {"id": 700, "source": "0700_wavefront-launch.yaml"},
"INSTMIX": {"id": 1000, "source": "1000_compute-unit-instruction-mix.yaml"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed today, would be better to include FLOPs throughput metrics with this option, and maybe change name to COMPUTE?

section_config = {
"SOL": {"id": 200, "source": "0200_system-speed-of-light.yaml"},
"MEMCHART": {"id": 300, "source": "0300_mem_chart.yaml"},
"WAVEFRONT": {"id": 700, "source": "0700_wavefront-launch.yaml"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this "LAUNCH_STATS" to better describe what we are going to show in this section?

@gsitaram
Copy link

@skyreflectedinmirrors , @gsitaram, please help to review the new profiling option: --section ["SOL", "MEMCHART", "WAVEFRONT", "INSTMIX"] This was a ask to match NV having. You know we have similar option -b, which is more HW oriented. My question would be: should we keep --section, -b separate or not?

I feel that we should keep -b because it has been recommended to our customers as a helpful tool for faster analysis, and the sections do not cover the entire gamut of metrics that the tool collects.

@vedithal-amd
Copy link
Contributor Author

vedithal-amd commented Feb 21, 2025

Me and @feizheng10 had a discussion about this feature today...

Currently the way '-b' or '--block' option works in 'analyze' and 'profile' is different as shown below

$ build/rocprof-compute.bin analyze --help | grep -i "\--block"
  -b  [ ...], --block  [ ...]                   Specify hardware block/metric id(s) from --list-metrics for filtering.
$ build/rocprof-compute.bin profile --help | grep -i -A5 "\--block"
  -b  [ ...], --block  [ ...]                           Hardware block filtering:
                                                           SQ
                                                           SQC
                                                           TA
                                                           TD
                                                           TCP

The former filters the analysis report based on 'report block' such as 'System Speed of Light', 'Memory Chart', 'Wavefront Launch statistics' etc..

The latter filters the profiling operation based on hardware IP blocks such as TA, TCP, SQ etc...

This behavior is inconsistent, and we would like to remove 'hardware IP block' based filtering in 'profile' mode in favor of 'report block' based filtering. The former is less useful for kernel developers/profilers as there is no one to one correspondence between hardware IP block and analysis report blocks. For example, filtering by only TCP (L1 cache) or TCC (L2 cache) will affect 'System Speed of Light', 'Memory Chart', 'Instruction Cache' report blocks.

Both methods of filtering will save up on profiling time, so we are not losing up on that here.

We are thinking of supporting all 19 yaml files for report blocks using the '-b' option during 'profile' mode (instead of specifying hardware IP block). Users can filter based on multiple report blocks and sub-blocks using block numbers (instead of ambiguous acronyms), for example, 'rocprof-compute profile -b 4, 4.5, 5, 5.6'

To get the report block numbers corresponding to report block titles, you can use the '--list-metrics' options during 'analyze' mode. We want to replicate this in 'profile' mode, such that, users can grep for the report block title name and obtain the report block numbers to be used for filtering.

For example:

$ build/rocprof-compute.bin analyze -p tests/workloads/device_filter/MI200 --list-metrics gfx90a | grep -i "instruction mix"
10 -> Compute Units - Instruction Mix
        10.1 -> Overall Instruction Mix

--list-metrics will take an optional argument for GPU GFX architecture since report blocks maybe different per architecture. If no argument is provided, it will be automatically detected using 'rocm-smi' tool.

To summarize:

  • Update '-b' option in profile mode to take as argument, report block/sub-block numbers instead of hardware IP block names
  • Add '--list-metrics' option in profile mode to list all available report blocks/sub-blocks for the given GFX architecture, mention both block numbers and block title names

NOTE that this will break backward compatibility in the sense that '-b' option in profile mode will work differently.

@gsitaram, @skyreflectedinmirrors, could you please provide your comments on the above implementation suggestions.

@gsitaram
Copy link

I like the idea of unifying what -b means in profile and analyze modes. But this change would break backward compatibility for the profile mode as -b will now mean something different. We have to think of providing appropriate warnings for a few releases for those users who may have hardcoded their commands into their scripts, etc.

@vedithal-amd
Copy link
Contributor Author

I like the idea of unifying what -b means in profile and analyze modes. But this change would break backward compatibility for the profile mode as -b will now mean something different. We have to think of providing appropriate warnings for a few releases for those users who may have hardcoded their commands into their scripts, etc.

Sure, we will add a warning upon usage of -b options for a few releases to warn people of change in functionality.

One thing to note, in this PR, I have updated profile mode to dump the profiling filters in the workload folder so that when analyze mode is run it will only show the report blocks that have been filtered during profiling. If you want to see other report blocks you will explicitly have to mention them in the -b filter during analyze mode. Hope that is OK. This makes logical sense to me as other report blocks might have empty values due to required counters not being collected.

@skyreflectedinmirrors
Copy link
Contributor

I like the idea, but I think this would whole concept will need accompanying docs updates. A few specific comments:

Users can filter based on multiple report blocks and sub-blocks using block numbers (instead of ambiguous acronyms), for example, 'rocprof-compute profile -b 4, 4.5, 5, 5.6'

One question there: does 4.5 match 14.5 and 4.5 (e.g.)? I.e., is this an exact match, or a regex search, etc.?

To get the report block numbers corresponding to report block titles, you can use the '--list-metrics' options during 'analyze' mode. We want to replicate this in 'profile' mode, such that, users can grep for the report block title name and obtain the report block numbers to be used for filtering.

I would like to see what that looks like, but I like the general concept.

Sure, we will add a warning upon usage of -b options for a few releases to warn people of change in functionality.

Instead of changing the default, I'd suggest you simply expand the list of choices (

choices=["SQ", "SQC", "TA", "TD", "TCP", "TCC", "SPI", "CPC", "CPF"],
) to include the table numbers, and after parsing, warn users if they are still passing in 'SQ' / 'TCC', etc. If they are, maintain the old code-path for filtering metrics for 2-3 releases, and if not, go down the new code path (and probably disallow mixes?)

That way you don't break anyone's existing workflow, while also ensuring anyone using this option will see the warning.

@vedithal-amd
Copy link
Contributor Author

vedithal-amd commented Feb 25, 2025

One question there: does 4.5 match 14.5 and 4.5 (e.g.)? I.e., is this an exact match, or a regex search, etc.?
In profile mode:

It is going to be an exact match not regex. Default is filter for all report blocks and IP blocks. If you want to filter for all report blocks but one, you need to specify all but one on cmdline. I think adding regex will be confusing for developer and user even though it is more flexible. Analyze mode also does exact match.

I would like to see what that looks like, but I like the general concept.

It would look like this like I mentioned above :)

$ build/rocprof-compute.bin analyze -p tests/workloads/device_filter/MI200 --list-metrics gfx90a | grep -i "instruction mix"
10 -> Compute Units - Instruction Mix
        10.1 -> Overall Instruction Mix

That way you don't break anyone's existing workflow, while also ensuring anyone using this option will see the warning.

I like the idea of phased deprecation. In first phase (ROCm 6.5) -b will take both report blocks and IP blocks (mixing will be allowed) and warning will be emitted when IP block is detected in filter. In second phase (ROCm 6.6 or 6.7 or 6.x ?) -b will not accept IP blocks and error will be emitted.

I like the idea, but I think this would whole concept will need accompanying docs update

Thanks for your feedback, I will add checklist item to update rocprof-compute public docs and also update changelog

@vedithal-amd vedithal-amd changed the title Analysis section based filters for profiling Analysis report block based filtering for profiling Feb 25, 2025
* Add --section option to profile stage for report sections based filtering
    * Only counters associated with provided report sections will be
      collected

* Add section based filtering to SOC base class

* Add parsing logic to identify hardware counters from report
  configuration files

* Add filtering logic to write only filtered counters in perfmon files
* Log not collected counters in one line
* Write arguments provided during profiling in output workload folder
* Only show sections of report during analysis which provided in section filtering during profiling
* Do not show sections of the report during analysis which have empty
  columns
* Instruction mix section filter
* Instruction mix and Memory chart section filters
* Instruction mix section filter and CPC IP block filter
* Instruction mix section filter and global_write kernel filter
* TA IP block filter
@vedithal-amd vedithal-amd force-pushed the vedithal/selective-counter branch from 6e71fa9 to 9818cb7 Compare February 27, 2025 20:15
* -b now take both hardware IP blocks as well as metric ids
* Add --list-metrics option to profile mode with architecture auto
  detect
* Update counter detection method to look at text of yaml config file or
  subsections of yaml config file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants