Cherry-pick Missing Content to v6.11 #39

jamieNguyenNVIDIA · 2025-01-13T17:53:56Z

Upon comparing our v6.8 and v6.11 trees, we found a handful of series missing:

8b788f36c4c2 cppc_cpufreq: Remove HiSilicon CPPC workaround
152774bd6b96 cppc_cpufreq: Use desired perf if feedback ctrs are 0 or unchanged


5ac091ff96a8 NVIDIA: SAUCE: ACPI/HMAT: Move HMAT messages to pr_debug()


b5d5407ecbf3 perf: cs-etm: Print queue number in raw trace dump
03881c380635 perf: cs-etm: Support version 0.1 of HW_ID packets
f2a3dc6baacd perf: cs-etm: Only save valid trace IDs into files
1fa740c52837 perf: cs-etm: Create decoders based on the trace ID mappings
5bbf91f4bc89 perf: cs-etm: Move traceid_list to each queue
a60f3af9d4de perf: cs-etm: Allocate queues for all CPUs
c2326823b006 perf cs-etm: Create decoders after both AUX and HW_ID search passes 


8465f5cd0613 coresight: Make trace ID map spinlock local to the map                                           
0ca7032c5f1e coresight: Emit sink ID in the HW_ID packets
d9a69f1cd5ca coresight: Remove pending trace ID release mechanism
664531368e80 coresight: Use per-sink trace ID maps for Perf sessions
bbf446e8edf2 coresight: Make CPU id map a property of a trace ID map
d32196e54b4d coresight: Expose map arguments in trace ID API
1f2293aa201a coresight: Move struct coresight_trace_id_map to common header
97224b3cc586 coresight: Clarify comments around the PID of the sink owner
3abb5a76effb coresight: Remove unused ETM Perf stubs

This PR adds all of these.

JIRA: [DGX-11157]
Testing:

cpufreq: Reboot cycles for cppc_cpufreq series. Confirm that we don't see cppc_cpufreq ->get() failed logged.
acpi/hmat: Reboot and confirm we don't see the ACPI/HMAT: Initiator-Target messages logged.
coresight/cs-etm: Run ./perf record -e cs_etm//u ls and confirm that this succeeds

The CPPC performance feedback counters could be 0 or unchanged when the target cpu is in a low-power idle state, e.g. power-gated or clock-gated. When the counters are 0, cppc_cpufreq_get_rate() returns 0 KHz, which makes cpufreq_online() get a false error and fail to generate a cpufreq policy. When the counters are unchanged, the existing cppc_perf_from_fbctrs() returns a cached desired perf, but some platforms may update the real frequency back to the desired perf reg. For the above cases in cppc_cpufreq_get_rate(), get the latest desired perf from the CPPC reg to reflect the frequency because some platforms may update the actual frequency back there; if failed, use the cached desired perf. Fixes: 6a4fec4 ("cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.") Signed-off-by: Jie Zhan <[email protected]> Reviewed-by: Zeng Heng <[email protected]> Reviewed-by: Ionela Voinescu <[email protected]> Reviewed-by: Huisong Li <[email protected]> Signed-off-by: Viresh Kumar <[email protected]> (cherry picked from commit c471956 linux-next) Signed-off-by: Jamie Nguyen <[email protected]> Tested-by: Carol Soto <[email protected]>

Since commit 6c8d750 ("cpufreq / cppc: Work around for Hisilicon CPPC cpufreq"), we introduce a workround for HiSilicon platforms that do not support performance feedback counters, whereas they can get the actual frequency from the desired perf register. Later on, FIE is disabled in that workaround as well. Now the workround can be handled by the common code. Desired perf would be read and converted to frequency if feedback counters don't change. FIE would be disabled if the CPPC regs are in PCC region. Hence, the workaround is no longer needed and can be safely removed, in an effort to consolidate the driver procedure. Signed-off-by: Jie Zhan <[email protected]> Reviewed-by: Xiongfeng Wang <[email protected]> Reviewed-by: Huisong Li <[email protected]> [ Viresh: Move fie_disabled withing CONFIG option to fix warning ] Signed-off-by: Viresh Kumar <[email protected]> (cherry picked from commit ea1829d linux-next) Signed-off-by: Jamie Nguyen <[email protected]> Tested-by: Carol Soto <[email protected]>

The HMAT messages printed at boot, beyond being noisy, can also print details for nodes that are not yet enabled. The primary method to consume HMAT details is via sysfs, and the sysfs interface gates what is emitted by whether the node is online or not. Hide the messages by default by moving them from "info" to "debug" log level. Otherwise, these prints are just a pretty-print way to dump the ACPI HMAT table. It has always been the case that post-analysis was required for these messages to map proximity-domains to Linux NUMA nodes, and as Priya points out that analysis also needs to consider whether the proximity domain is marked "enabled" in the SRAT. Reported-by: Priya Autee <[email protected]> Signed-off-by: Dan Williams <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://patch.msgid.link/170668982094.318782.2963631284830500182.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <[email protected]> (cherry picked from commit e2b952ffafced49fa6bd5cdc90f472b8bd932b5d cxl-next) Signed-off-by: Carol L Soto <[email protected]>

Both of these passes gather information about how to create the decoders. AUX records determine formatted/unformatted, and the HW_IDs determine the traceID/metadata mappings. Therefore it makes sense to cache the information and wait until both passes are over until creating the decoders, rather than creating them at the first HW_ID found. This will allow a simplification of the creation process where cs_etm_queue->traceid_list will exclusively used to create the decoders, rather than the current two methods depending on whether the trace is formatted or not. Previously the sample CPU from the AUX record was used to initialize the decoder CPU, but actually sample CPU == AUX queue index in per-CPU mode, so saving the sample CPU isn't required. Similarly formatted/unformatted was used upfront to create the decoders, but now it's cached until later. Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Tested-by: Leo Yan <[email protected]> Acked-by: Suzuki Poulouse <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit b6aa0de) Signed-off-by: Carol L Soto <[email protected]>

Make cs_etm__setup_queue() setup a queue even if it's empty, and pre-allocate queues based on the max CPU that was recorded. In per-CPU mode aux queues are indexed based on CPU ID even if all CPUs aren't recorded, sparse queue arrays aren't used. This will allow HW_IDs to be saved even if no aux data was received in that queue without having to call cs_etm__setup_queue() from two different places. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 57880a7) Signed-off-by: Carol L Soto <[email protected]>

The global list won't work for per-sink trace ID allocations, so put a list in each queue where the IDs will be unique to that queue. To keep the same behavior as before, for version 0 of the HW_ID packets, copy all the HW_ID mappings into all queues. This change doesn't effect the decoders, only trace ID lookups on the Perf side. The decoders are still created with global mappings which will be fixed in a later commit. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 77c123f) Signed-off-by: Carol L Soto <[email protected]>

Now that each queue has a unique set of trace ID mappings, use this list to create the decoders. In unformatted mode just add a single mapping so only one decoder is made. Previously each queue would have a decoder created for each traced CPU on the system but this won't work anymore because CPUs can have overlapping trace IDs. This also means that the CORESIGHT_TRACE_ID_UNUSED_FLAG isn't needed any more. If mappings aren't added then decoders aren't created, rather than needing a flag to suppress creation. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 19c3e4d) Signed-off-by: Carol L Soto <[email protected]>

This isn't a bug because Perf always masks with CORESIGHT_TRACE_ID_VAL_MASK before using these values, but to avoid it looking like it could be, make an effort to not save bad values. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 940007c) Signed-off-by: Carol L Soto <[email protected]>

v0.1 HW_ID packets have a new field that describes which sink each CPU writes to. Use the sink ID to link trace ID maps to each other so that mappings are shared wherever the sink is shared. Also update the error message to show that overlapping IDs aren't an error in per-thread mode, just not supported. In the future we can use the CPU ID from the AUX records, or watch for changing sink IDs on HW_ID packets to use the correct decoders. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 1506af6) Signed-off-by: Carol L Soto <[email protected]>

Now that we have overlapping trace IDs it's also useful to know what the queue number is to be able to distinguish the source of the trace so print it inline. Hide it behind the -v option because it might not be obvious to users what the queue number is. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 022aa67) Signed-off-by: Carol L Soto <[email protected]>

This file is never included anywhere if CONFIG_CORESIGHT is not set so they are unused and aren't currently compile tested with any config so remove them. Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 3417200) Signed-off-by: Carol L Soto <[email protected]>

"Process being monitored" and "pid of the process to monitor" imply that this would be the same PID if there were two sessions targeting the same process. But this is actually the PID of the process that did the Perf event open call, rather than the target of the session. So update the comments to make this clearer. Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit eda1d11) Signed-off-by: Carol L Soto <[email protected]>

The trace ID maps will need to be created and stored by the core and Perf code so move the definition up to the common header. Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit acb0184) Signed-off-by: Carol L Soto <[email protected]>

The trace ID API is currently hard coded to always use the global map. Add public versions that allow the map to be passed in so that Perf mode can use per-sink maps. Keep the non-map versions so that sysfs mode can continue to use the default global map. System ID functions are unchanged because they will always use the default map. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Tested-by: Leo Yan <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 7e52877) Signed-off-by: Carol L Soto <[email protected]>

The global CPU ID mappings won't work for per-sink ID maps so move it to the ID map struct. coresight_trace_id_release_all_pending() is hard coded to operate on the default map, but once Perf sessions use their own maps the pending release mechanism will be deleted. So it doesn't need to be extended to accept a trace ID map argument at this point. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Tested-by: Leo Yan <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit d53c825) Signed-off-by: Carol L Soto <[email protected]>

This will allow sessions with more than CORESIGHT_TRACE_IDS_MAX ETMs as long as there are fewer than that many ETMs connected to each sink. Each sink owns its own trace ID map, and any Perf session connecting to that sink will allocate from it, even if the sink is currently in use by other users. This is similar to the existing behavior where the dynamic trace IDs are constant as long as there is any concurrent Perf session active. It's not completely optimal because slightly more IDs will be used than necessary, but the optimal solution involves tracking the PIDs of each session and allocating ID maps based on the session owner. This is difficult to do with the combination of per-thread and per-cpu modes and some scheduling issues. The complexity of this isn't likely to worth it because even with multiple users they'd just see a difference in the ordering of ID allocations rather than hitting any limits (unless the hardware does have too many ETMs connected to one sink). Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 5ad628a) Signed-off-by: Carol L Soto <[email protected]>

Pending the release of IDs was a way of managing concurrent sysfs and Perf sessions in a single global ID map. Perf may have finished while sysfs hadn't, and Perf shouldn't release the IDs in use by sysfs and vice versa. Now that Perf uses its own exclusive ID maps, pending release doesn't result in any different behavior than just releasing all IDs when the last Perf session finishes. As part of the per-sink trace ID change, we would have still had to make the pending mechanism work on a per-sink basis, due to the overlapping ID allocations, so instead of making that more complicated, just remove it. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit de0029f) Signed-off-by: Carol L Soto <[email protected]>

For Perf to be able to decode when per-sink trace IDs are used, emit the sink that's being written to for each ETM. Perf currently errors out if it sees a newer packet version so instead of bumping it, add a new minor version field. This can be used to signify new versions that have backwards compatible fields. Considering this change is only for high core count machines, it doesn't make sense to make a breaking change for everyone. Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 487eec8) Signed-off-by: Carol L Soto <[email protected]>

Reduce contention on the lock by replacing the global lock with one for each map. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 988d40a) Signed-off-by: Carol L Soto <[email protected]>

Jie Zhan and others added 19 commits January 13, 2025 09:57

jamieNguyenNVIDIA force-pushed the jamien/6.11_backports_all branch from 9418582 to 4c24dc2 Compare January 13, 2025 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick Missing Content to v6.11 #39

Cherry-pick Missing Content to v6.11 #39

jamieNguyenNVIDIA commented Jan 13, 2025

Cherry-pick Missing Content to v6.11 #39

Are you sure you want to change the base?

Cherry-pick Missing Content to v6.11 #39

Conversation

jamieNguyenNVIDIA commented Jan 13, 2025