diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 9d41360d2..d9601f509 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -4,3 +4,4 @@ docs/* @ROCm/rocm-documentation *.md @ROCm/rocm-documentation *.rst @ROCm/rocm-documentation +.readthedocs.yaml @ROCm/rocm-documentation diff --git a/.github/dependabot.yml b/.github/dependabot.yml index ac6621f19..fe22a4c3d 100644 --- a/.github/dependabot.yml +++ b/.github/dependabot.yml @@ -9,3 +9,14 @@ updates: directory: "/" # Location of package manifests schedule: interval: "weekly" + + - package-ecosystem: "pip" # See documentation for possible values + directory: "/docs/sphinx" # Location of package manifests + open-pull-requests-limit: 10 + schedule: + interval: "daily" + labels: + - "documentation" + - "dependencies" + reviewers: + - "samjwu" diff --git a/.gitignore b/.gitignore index 5f701c32d..03f124eb9 100644 --- a/.gitignore +++ b/.gitignore @@ -37,6 +37,10 @@ # Python cache files *.pyc +# Documentation artifacts +/_build +_toc.yml + /build* /.vscode /.cache diff --git a/.readthedocs.yaml b/.readthedocs.yaml new file mode 100644 index 000000000..3c310707b --- /dev/null +++ b/.readthedocs.yaml @@ -0,0 +1,18 @@ +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +version: 2 + +build: + os: ubuntu-22.04 + tools: + python: "3.10" + +python: + install: + - requirements: docs/sphinx/requirements.txt + +sphinx: + configuration: docs/conf.py + +formats: [] diff --git a/README.md b/README.md index a938f0af1..17fa741ee 100755 --- a/README.md +++ b/README.md @@ -8,8 +8,6 @@ [![Installer Packaging (CPack)](https://github.com/ROCm/omnitrace/actions/workflows/cpack.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/cpack.yml) [![Documentation](https://github.com/ROCm/omnitrace/actions/workflows/docs.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/docs.yml) -> ***[Omnitrace](https://github.com/ROCm/omnitrace) is an AMD open source research project and is not supported as part of the ROCm software stack.*** - ## Overview AMD Research is seeking to improve observability and performance analysis for software running on AMD heterogeneous systems. @@ -87,8 +85,8 @@ such as the memory usage, page-faults, and context-switches, and thread-level me ## Documentation -The full documentation for [omnitrace](https://github.com/ROCm/omnitrace) is available at [rocm.github.io/omnitrace](https://rocm.github.io/omnitrace/). -See the [Getting Started documentation](https://rocm.github.io/omnitrace/getting_started) for general tips and a detailed discussion about sampling vs. binary instrumentation. +The full documentation for [omnitrace](https://github.com/ROCm/omnitrace) is available at [the ROCm Omnitrace documentation repository](https://rocm.docs.amd.com/projects/omnitrace/en/latest/index.html). +See the [Getting Started documentation](https://rocm.docs.amd.com/projects/omnitrace/en/conceptual/how-omnitrace-works.html) for general tips and a detailed discussion about sampling vs. binary instrumentation. ## Quick Start @@ -109,7 +107,7 @@ wget https://github.com/ROCm/omnitrace/releases/latest/download/omnitrace-instal python3 ./omnitrace-install.py --prefix /opt/omnitrace/rocm-5.4 --rocm 5.4 ``` -See the [Installation Documentation](https://rocm.github.io/omnitrace/installation) for detailed information. +See the [Installation Documentation](https://rocm.docs.amd.com/projects/omnitrace/en/install/install.html) for detailed information. ### Setup @@ -298,13 +296,13 @@ for `foo` via the direct call within `spam`. There will be no entries for `bar` - Select "Open trace file" from panel on the left - Locate the omnitrace perfetto output (extension: `.proto`) -![omnitrace-perfetto](source/docs/images/omnitrace-perfetto.png) +![omnitrace-perfetto](docs/data/omnitrace-perfetto.png) -![omnitrace-rocm](source/docs/images/omnitrace-rocm.png) +![omnitrace-rocm](docs/data/omnitrace-rocm.png) -![omnitrace-rocm-flow](source/docs/images/omnitrace-rocm-flow.png) +![omnitrace-rocm-flow](docs/data/omnitrace-rocm-flow.png) -![omnitrace-user-api](source/docs/images/omnitrace-user-api.png) +![omnitrace-user-api](docs/data/omnitrace-user-api.png) ## Using Perfetto tracing with System Backend diff --git a/docs/.gitignore b/docs/.gitignore new file mode 100644 index 000000000..8fca1b797 --- /dev/null +++ b/docs/.gitignore @@ -0,0 +1,2 @@ +_build/ +_doxygen/ \ No newline at end of file diff --git a/docs/conceptual/data-collection-modes.rst b/docs/conceptual/data-collection-modes.rst new file mode 100644 index 000000000..88032387f --- /dev/null +++ b/docs/conceptual/data-collection-modes.rst @@ -0,0 +1,146 @@ +.. meta:: + :description: Omnitrace documentation and reference + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + +********************** +Data collection modes +********************** + +Omnitrace supports several modes of recording trace and profiling data for your application. + +.. note:: + + For an explanation of the terms used in this topic, see + the :doc:`Omnitrace glossary <../reference/omnitrace-glossary>`. + ++-----------------------------+---------------------------------------------------------+ +| Mode | Description | ++=============================+=========================================================+ +| Binary Instrumentation | Locates functions (and loops, if desired) in the binary | +| | and inserts snippets at the entry and exit | ++-----------------------------+---------------------------------------------------------+ +| Statistical Sampling | Periodically pauses application at specified intervals | +| | and records various metrics for the given call stack | ++-----------------------------+---------------------------------------------------------+ +| Callback APIs | Parallelism frameworks such as ROCm, OpenMP, and Kokkos | +| | make callbacks into Omnitrace to provide information | +| | about the work the API is performing | ++-----------------------------+---------------------------------------------------------+ +| Dynamic Symbol Interception | Wrap function symbols defined in a position independent | +| | dynamic library/executable, like ``pthread_mutex_lock`` | +| | in ``libpthread.so`` or ``MPI_Init`` in the MPI library | ++-----------------------------+---------------------------------------------------------+ +| User API | User-defined regions and controls for Omnitrace | ++-----------------------------+---------------------------------------------------------+ + +The two most generic and important modes are binary instrumentation and statistical sampling. +It is important to understand their advantages and disadvantages. +Binary instrumentation and statistical sampling can be performed with the ``omnitrace-instrument`` +executable. For statistical sampling, it's highly recommended to use the +``omnitrace-sample`` executable instead if binary instrumentation isn't required or needed. +Callback APIs and dynamic symbol interception can be utilized with either tool. + +Binary instrumentation +----------------------------------- + +Binary instrumentation lets you record deterministic measurements for +every single invocation of a given function. +Binary instrumentation effectively adds instructions to the target application to +collect the required information. It therefore has the potential to cause performance +changes which might, in some cases, lead to inaccurate results. The effect depends on +the information being collected and which features are activated in Omnitrace. +For example, collecting only the wall-clock timing data +has less of an effect than collecting the wall-clock timing, CPU-clock timing, +memory usage, cache-misses, and number of instructions that were run. Similarly, +collecting a flat profile has less overhead than a hierarchical profile +and collecting a trace OR a profile has less overhead than collecting a +trace AND a profile. + +In Omnitrace, the primary heuristic for controlling the overhead with binary +instrumentation is the minimum number of instructions for selecting functions +for instrumentation. + +Statistical sampling +----------------------------------- + +Statistical call-stack sampling periodically interrupts the application at +regular intervals using operating system interrupts. +Sampling is typically less numerically accurate and specific, but the +target program runs at nearly full speed. +In contrast to the data derived from binary instrumentation, the resulting +data is not exact but is instead a statistical approximation. +However, sampling often provides a more accurate picture of the application +execution because it is less intrusive to the target application and has fewer +side effects on memory caches or instruction decoding pipelines. Furthermore, +because sampling does not affect the execution speed as much, is it +relatively immune to over-evaluating the cost of small, frequently called +functions or "tight" loops. + +In Omnitrace, the overhead for statistical sampling depends on the +sampling rate and whether the samples are taken with respect to the CPU time +and/or real time. + +Binary instrumentation vs. statistical sampling example +------------------------------------------------------- + +Consider the following code: + +.. code-block:: c++ + + long fib(long n) + { + if(n < 2) return n; + return fib(n - 1) + fib(n - 2); + } + + void run(long n) + { + long result = fib(n); + printf("[%li] fibonacci(%li) = %li\n", i, n, result); + } + + int main(int argc, char** argv) + { + long nfib = 30; + long nitr = 10; + if(argc > 1) nfib = atol(argv[1]); + if(argc > 2) nitr = atol(argv[2]); + + for(long i = 0; i < nitr; ++i) + run(nfib); + + return 0; + } + +Binary instrumentation of the ``fib`` function will record **every single invocation** +of the function. For a very small function +such as ``fib``, this results in **significant** overhead since this simple function +takes about 20 instructions, whereas the entry and +exit snippets are ~1024 instructions. Therefore, you generally want to avoid +instrumenting functions where the instrumented function has significantly fewer +instructions than entry and exit instrumentation. (Note that many of the +instructions in entry and exit functions are either logging functions or +depend on the runtime settings and thus might never run). However, +due to the number of potential instructions in the entry and exit snippets, +the default behavior of ``omnitrace-instrument`` is to only instrument functions +which contain fewer than 1024 instructions. + +However, recording every single invocation of the function can be extremely +useful for detecting anomalies, such as profiles that show minimum or maximum values much smaller or larger +than the average or a high standard deviation. In this case, the traces help you +identify exactly when and where those instances deviated from the norm. +Compare the level of detail in the following traces. In the top image, +every instance of the ``fib`` function is instrumented, while in the bottom image, +the ``fib`` call-stack is derived via sampling. + +Binary instrumentation of the Fibonacci function +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. image:: ../data/fibonacci-instrumented.png + :alt: Visualization of the output of a binary instrumentation of the Fibonacci function + +Statistical sampling of the Fibonacci function +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. image:: ../data/fibonacci-sampling.png + :alt: Visualization of the output of a statistical sample of the Fibonacci function \ No newline at end of file diff --git a/docs/conceptual/omnitrace-feature-set.rst b/docs/conceptual/omnitrace-feature-set.rst new file mode 100644 index 000000000..4a8aceafb --- /dev/null +++ b/docs/conceptual/omnitrace-feature-set.rst @@ -0,0 +1,137 @@ +.. meta:: + :description: Omnitrace documentation and reference + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + +*************************************** +The Omnitrace feature set and use cases +*************************************** + +`Omnitrace `_ is designed to be highly extensible. +Internally, it leverages the `Timemory performance analysis toolkit `_ +to manage extensions, resources, data, and other items. It supports the following features, +modes, metrics, and APIs. + +Data collection modes +======================================== + +* Dynamic instrumentation + + * Runtime instrumentation: Instrument executables and shared libraries at runtime + * Binary rewriting: Generate a new executable and/or library with instrumentation built-in + +* Statistical sampling: Periodic software interrupts per-thread +* Process-level sampling: A background thread records process-, system- and device-level metrics while the application runs +* Causal profiling: Quantifies the potential impact of optimizations in parallel code + +.. note:: + + Critical trace support was removed in Omnitrace v1.11.0. + It was replaced by the causal profiling feature. + +Data analysis +======================================== + +* High-level summary profiles with mean, min, max, and standard deviation statistics + + * Low overhead and memory efficient + * Ideal for running at scale + +* Comprehensive traces for every individual event and measurement +* Application speed-up predictions resulting from potential optimizations in functions and lines of code based on causal profiling + +Parallelism API support +======================================== + +* HIP +* HSA +* Pthreads +* MPI +* Kokkos-Tools (KokkosP) +* OpenMP-Tools (OMPT) + +GPU metrics +======================================== + +* GPU hardware counters +* HIP API tracing +* HIP kernel tracing +* HSA API tracing +* HSA operation tracing +* System-level sampling (via rocm-smi) + + * Memory usage + * Power usage + * Temperature + * Utilization + +CPU metrics +======================================== + +* CPU hardware counters sampling and profiles +* CPU frequency sampling +* Various timing metrics + + * Wall time + * CPU time (process and thread) + * CPU utilization (process and thread) + * User CPU time + * Kernel CPU time + +* Various memory metrics + + * High-water mark (sampling and profiles) + * Memory page allocation + * Virtual memory usage + +* Network statistics +* I/O metrics +* Many others + +Third-party API support +======================================== + +* TAU +* LIKWID +* Caliper +* CrayPAT +* VTune +* NVTX +* ROCTX + +Omnitrace use cases +======================================== + +When analyzing the performance of an application, do NOT +assume you know where the performance bottlenecks are +and why they are happening. Omnitrace is a tool for analyzing the entire +application and its performance. It is +ideal for characterizing where optimization would have the greatest impact +on an end-to-end run of the application and for +viewing what else is happening on the system during a performance bottleneck. + +When GPUs are involved, there is a tendency to assume that +the quickest path to performance improvement is minimizing +the runtime of the GPU kernels. This is a highly flawed assumption. +If you optimize the runtime of a kernel from one millisecond +to 1 microsecond (1000x speed-up) but the original application never +spent time waiting for kernels to complete, +there would be no statistically significant reduction in the end-to-end +runtime of your application. In other words, it does not matter +how fast or slow the code on GPU is if the application has a +bottleneck on waiting on the GPU. + +Use Omnitrace to obtain a high-level view of the entire application. Use it +to determine where the performance bottlenecks are and +obtain clues to why these bottlenecks are happening. Rather than worrying about kernel +performance, start your investigation with Omnitrace, which characterizes the +broad picture. + +.. note:: + + For insight into the execution of individual kernels on the GPU, + use `Omniperf `_. + +In terms of CPU analysis, Omnitrace does not target any specific vendor. +It works just as well on AMD and non-AMD CPUs. +With regard to the GPU, Omnitrace is currently restricted to HIP and HSA APIs +and kernels running on AMD GPUs. \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 000000000..718797ac1 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,56 @@ +# MIT License + +# Copyright (c) 2023 - 2024 Advanced Micro Devices, Inc. All rights reserved. + +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: + +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. + +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. + +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +import re + +from rocm_docs import ROCmDocs + +with open("../VERSION", encoding="utf-8") as f: + match = re.search(r"([0-9.]+)[^0-9.]+", f.read()) + if not match: + raise ValueError("VERSION not found!") + version_number = match[1] + +external_projects_current_project = "omnitrace" + +project = "omnitrace" +author = "Advanced Micro Devices, Inc." +copyright = "Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved." +version = version_number +release = version_number +html_title = f"Omnitrace {version} documentation" + +external_toc_path = "./sphinx/_toc.yml" + +docs_core = ROCmDocs(html_title) +docs_core.setup() +docs_core.run_doxygen(doxygen_root="doxygen", doxygen_path="doxygen/xml") +docs_core.enable_api_reference() + +for sphinx_var in ROCmDocs.SPHINX_VARS: + globals()[sphinx_var] = getattr(docs_core, sphinx_var) diff --git a/docs/data/causal-foobar.png b/docs/data/causal-foobar.png new file mode 100644 index 000000000..a887b126a Binary files /dev/null and b/docs/data/causal-foobar.png differ diff --git a/docs/data/fibonacci-instrumented.png b/docs/data/fibonacci-instrumented.png new file mode 100644 index 000000000..95502062b Binary files /dev/null and b/docs/data/fibonacci-instrumented.png differ diff --git a/docs/data/fibonacci-sampling.png b/docs/data/fibonacci-sampling.png new file mode 100644 index 000000000..d6da81138 Binary files /dev/null and b/docs/data/fibonacci-sampling.png differ diff --git a/docs/data/omnitrace-perfetto.png b/docs/data/omnitrace-perfetto.png new file mode 100644 index 000000000..5bd8da727 Binary files /dev/null and b/docs/data/omnitrace-perfetto.png differ diff --git a/docs/data/omnitrace-rocm-flow.png b/docs/data/omnitrace-rocm-flow.png new file mode 100644 index 000000000..ee188b455 Binary files /dev/null and b/docs/data/omnitrace-rocm-flow.png differ diff --git a/docs/data/omnitrace-rocm.png b/docs/data/omnitrace-rocm.png new file mode 100644 index 000000000..8f80ae6a8 Binary files /dev/null and b/docs/data/omnitrace-rocm.png differ diff --git a/docs/data/omnitrace-user-api.png b/docs/data/omnitrace-user-api.png new file mode 100644 index 000000000..e1d748a5f Binary files /dev/null and b/docs/data/omnitrace-user-api.png differ diff --git a/docs/doxygen/.gitignore b/docs/doxygen/.gitignore new file mode 100644 index 000000000..0719ebc38 --- /dev/null +++ b/docs/doxygen/.gitignore @@ -0,0 +1,3 @@ +html/ +latex/ +xml/ \ No newline at end of file diff --git a/docs/doxygen/Doxyfile b/docs/doxygen/Doxyfile new file mode 100644 index 000000000..d02eb3d99 --- /dev/null +++ b/docs/doxygen/Doxyfile @@ -0,0 +1,373 @@ +# Doxyfile 1.8.20 + +#--------------------------------------------------------------------------- +# Project related configuration options +#--------------------------------------------------------------------------- +DOXYFILE_ENCODING = UTF-8 +PROJECT_NAME = omnitrace +PROJECT_NUMBER = 1.11.3 +PROJECT_BRIEF = "High-level and comprehensive application tracing and profiling on both the CPU and GPU" +PROJECT_LOGO = +OUTPUT_DIRECTORY = . +CREATE_SUBDIRS = NO +ALLOW_UNICODE_NAMES = YES +OUTPUT_LANGUAGE = English +OUTPUT_TEXT_DIRECTION = None +BRIEF_MEMBER_DESC = YES +REPEAT_BRIEF = YES +ABBREVIATE_BRIEF = +ALWAYS_DETAILED_SEC = YES +INLINE_INHERITED_MEMB = YES +FULL_PATH_NAMES = YES +STRIP_FROM_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-omnitrace/checkouts/ +STRIP_FROM_INC_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-omnitrace/checkouts/ +SHORT_NAMES = NO +JAVADOC_AUTOBRIEF = NO +JAVADOC_BANNER = NO +QT_AUTOBRIEF = NO +MULTILINE_CPP_IS_BRIEF = YES +PYTHON_DOCSTRING = YES +INHERIT_DOCS = YES +SEPARATE_MEMBER_PAGES = NO +TAB_SIZE = 4 +ALIASES = +OPTIMIZE_OUTPUT_FOR_C = NO +OPTIMIZE_OUTPUT_JAVA = NO +OPTIMIZE_FOR_FORTRAN = NO +OPTIMIZE_OUTPUT_VHDL = NO +OPTIMIZE_OUTPUT_SLICE = NO +EXTENSION_MAPPING = hpp=C++ \ + cpp=C++ \ + hh=C++ \ + cc=C++ \ + h=C \ + c=C \ + py=Python +MARKDOWN_SUPPORT = YES +TOC_INCLUDE_HEADINGS = 2 +AUTOLINK_SUPPORT = YES +BUILTIN_STL_SUPPORT = YES +CPP_CLI_SUPPORT = NO +SIP_SUPPORT = NO +IDL_PROPERTY_SUPPORT = YES +DISTRIBUTE_GROUP_DOC = NO +GROUP_NESTED_COMPOUNDS = YES +SUBGROUPING = YES +INLINE_GROUPED_CLASSES = NO +INLINE_SIMPLE_STRUCTS = YES +TYPEDEF_HIDES_STRUCT = NO +LOOKUP_CACHE_SIZE = 5 +NUM_PROC_THREADS = 0 +#--------------------------------------------------------------------------- +# Build related configuration options +#--------------------------------------------------------------------------- +EXTRACT_ALL = YES +EXTRACT_PRIVATE = NO +EXTRACT_PRIV_VIRTUAL = NO +EXTRACT_PACKAGE = NO +EXTRACT_STATIC = NO +EXTRACT_LOCAL_CLASSES = YES +EXTRACT_LOCAL_METHODS = NO +EXTRACT_ANON_NSPACES = NO +HIDE_UNDOC_MEMBERS = NO +HIDE_UNDOC_CLASSES = YES +HIDE_FRIEND_COMPOUNDS = NO +HIDE_IN_BODY_DOCS = NO +INTERNAL_DOCS = NO +CASE_SENSE_NAMES = NO +HIDE_SCOPE_NAMES = NO +HIDE_COMPOUND_REFERENCE= NO +SHOW_INCLUDE_FILES = YES +SHOW_GROUPED_MEMB_INC = NO +FORCE_LOCAL_INCLUDES = YES +INLINE_INFO = YES +SORT_MEMBER_DOCS = YES +SORT_BRIEF_DOCS = NO +SORT_MEMBERS_CTORS_1ST = YES +SORT_GROUP_NAMES = NO +SORT_BY_SCOPE_NAME = NO +STRICT_PROTO_MATCHING = NO +GENERATE_TODOLIST = NO +GENERATE_TESTLIST = NO +GENERATE_BUGLIST = NO +GENERATE_DEPRECATEDLIST= NO +ENABLED_SECTIONS = +MAX_INITIALIZER_LINES = 30 +SHOW_USED_FILES = YES +SHOW_FILES = YES +SHOW_NAMESPACES = YES +FILE_VERSION_FILTER = +LAYOUT_FILE = +CITE_BIB_FILES = +#--------------------------------------------------------------------------- +# Configuration options related to warning and progress messages +#--------------------------------------------------------------------------- +QUIET = NO +WARNINGS = YES +WARN_IF_UNDOCUMENTED = YES +WARN_IF_DOC_ERROR = YES +WARN_NO_PARAMDOC = YES +WARN_AS_ERROR = YES +WARN_FORMAT = "---> WARNING! $file:$line: $text" +WARN_LOGFILE = doc/warnings.log +#--------------------------------------------------------------------------- +# Configuration options related to the input files +#--------------------------------------------------------------------------- +INPUT = ../../README.md \ + ../../source/lib/omnitrace-user/omnitrace/types.h \ + ../../source/lib/omnitrace-user/omnitrace/categories.h \ + ../../source/lib/omnitrace-user/omnitrace/user.h \ + ../../source/lib/omnitrace-user/omnitrace/causal.h +INPUT_ENCODING = UTF-8 +FILE_PATTERNS = *.h \ + *.hh \ + *.hpp \ + *.c \ + *.cc \ + *.cxx \ + *.cpp \ + *.c++ \ + *.icc \ + *.tcc \ + *.py +RECURSIVE = YES +EXCLUDE = +EXCLUDE_SYMLINKS = YES +EXCLUDE_PATTERNS = */.git/* \ + ../../external/* \ + ../../examples/* \ + ../../tests/* +EXCLUDE_SYMBOLS = "std::*" \ + "OMNITRACE_ATTRIBUTE" \ + "OMNITRACE_VISIBILITY" \ + "OMNITRACE_PUBLIC_API" \ + "OMNITRACE_HIDDEN_API" \ + "SpaceHandle" \ + "KokkosPDevice*" +EXAMPLE_PATH = ../../examples +EXAMPLE_PATTERNS = *.h \ + *.hh \ + *.hpp \ + *.c \ + *.cc \ + *.cpp \ + *.py \ + *.txt +EXAMPLE_RECURSIVE = YES +IMAGE_PATH = +INPUT_FILTER = +FILTER_PATTERNS = +FILTER_SOURCE_FILES = NO +FILTER_SOURCE_PATTERNS = +USE_MDFILE_AS_MAINPAGE = ../../README.md +#--------------------------------------------------------------------------- +# Configuration options related to source browsing +#--------------------------------------------------------------------------- +SOURCE_BROWSER = YES +INLINE_SOURCES = YES +STRIP_CODE_COMMENTS = NO +REFERENCED_BY_RELATION = YES +REFERENCES_RELATION = YES +REFERENCES_LINK_SOURCE = YES +SOURCE_TOOLTIPS = YES +USE_HTAGS = NO +VERBATIM_HEADERS = YES +#--------------------------------------------------------------------------- +# Configuration options related to the alphabetical class index +#--------------------------------------------------------------------------- +ALPHABETICAL_INDEX = YES +COLS_IN_ALPHA_INDEX = 5 +IGNORE_PREFIX = +#--------------------------------------------------------------------------- +# Configuration options related to the HTML output +#--------------------------------------------------------------------------- +GENERATE_HTML = YES +HTML_OUTPUT = html +HTML_FILE_EXTENSION = .html +HTML_HEADER = ../_doxygen/header.html +HTML_FOOTER = ../_doxygen/footer.html +HTML_STYLESHEET = ../_doxygen/stylesheet.css +HTML_EXTRA_STYLESHEET = ../_doxygen/extra_stylesheet.css +HTML_EXTRA_FILES = +HTML_COLORSTYLE_HUE = 220 +HTML_COLORSTYLE_SAT = 100 +HTML_COLORSTYLE_GAMMA = 80 +HTML_TIMESTAMP = YES +HTML_DYNAMIC_MENUS = YES +HTML_DYNAMIC_SECTIONS = YES +HTML_INDEX_NUM_ENTRIES = 1000 +GENERATE_DOCSET = NO +DOCSET_FEEDNAME = "Doxygen generated docs" +DOCSET_BUNDLE_ID = org.doxygen.omnitrace +DOCSET_PUBLISHER_ID = org.doxygen.amdresearch +DOCSET_PUBLISHER_NAME = "Audacious Software Group" +GENERATE_HTMLHELP = NO +CHM_FILE = +HHC_LOCATION = +GENERATE_CHI = NO +CHM_INDEX_ENCODING = +BINARY_TOC = NO +TOC_EXPAND = YES +GENERATE_QHP = NO +QCH_FILE = +QHP_NAMESPACE = +QHP_VIRTUAL_FOLDER = doc +QHP_CUST_FILTER_NAME = +QHP_CUST_FILTER_ATTRS = +QHP_SECT_FILTER_ATTRS = +QHG_LOCATION = +GENERATE_ECLIPSEHELP = NO +ECLIPSE_DOC_ID = org.doxygen.omnitrace +DISABLE_INDEX = NO +GENERATE_TREEVIEW = NO +ENUM_VALUES_PER_LINE = 1 +TREEVIEW_WIDTH = 300 +EXT_LINKS_IN_WINDOW = YES +HTML_FORMULA_FORMAT = png +FORMULA_FONTSIZE = 12 +FORMULA_TRANSPARENT = YES +FORMULA_MACROFILE = +USE_MATHJAX = NO +MATHJAX_FORMAT = HTML-CSS +MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest +MATHJAX_EXTENSIONS = +MATHJAX_CODEFILE = +SEARCHENGINE = NO +SERVER_BASED_SEARCH = NO +EXTERNAL_SEARCH = NO +SEARCHENGINE_URL = +SEARCHDATA_FILE = searchdata.xml +EXTERNAL_SEARCH_ID = +EXTRA_SEARCH_MAPPINGS = +#--------------------------------------------------------------------------- +# Configuration options related to the LaTeX output +#--------------------------------------------------------------------------- +GENERATE_LATEX = NO +LATEX_OUTPUT = latex +LATEX_CMD_NAME = latex +MAKEINDEX_CMD_NAME = makeindex +LATEX_MAKEINDEX_CMD = makeindex +COMPACT_LATEX = NO +PAPER_TYPE = a4wide +EXTRA_PACKAGES = float +LATEX_HEADER = +LATEX_FOOTER = +LATEX_EXTRA_STYLESHEET = +LATEX_EXTRA_FILES = +PDF_HYPERLINKS = YES +USE_PDFLATEX = YES +LATEX_BATCHMODE = YES +LATEX_HIDE_INDICES = NO +LATEX_SOURCE_CODE = YES +LATEX_BIB_STYLE = plain +LATEX_TIMESTAMP = NO +LATEX_EMOJI_DIRECTORY = +#--------------------------------------------------------------------------- +# Configuration options related to the RTF output +#--------------------------------------------------------------------------- +GENERATE_RTF = NO +RTF_OUTPUT = rtf +COMPACT_RTF = NO +RTF_HYPERLINKS = NO +RTF_STYLESHEET_FILE = +RTF_EXTENSIONS_FILE = +RTF_SOURCE_CODE = NO +#--------------------------------------------------------------------------- +# Configuration options related to the man page output +#--------------------------------------------------------------------------- +GENERATE_MAN = NO +MAN_OUTPUT = man +MAN_EXTENSION = .3 +MAN_SUBDIR = +MAN_LINKS = YES +#--------------------------------------------------------------------------- +# Configuration options related to the XML output +#--------------------------------------------------------------------------- +GENERATE_XML = YES +XML_OUTPUT = xml +XML_PROGRAMLISTING = YES +XML_NS_MEMB_FILE_SCOPE = YES +#--------------------------------------------------------------------------- +# Configuration options related to the DOCBOOK output +#--------------------------------------------------------------------------- +GENERATE_DOCBOOK = NO +DOCBOOK_OUTPUT = docbook +DOCBOOK_PROGRAMLISTING = NO +#--------------------------------------------------------------------------- +# Configuration options for the AutoGen Definitions output +#--------------------------------------------------------------------------- +GENERATE_AUTOGEN_DEF = NO +#--------------------------------------------------------------------------- +# Configuration options related to the Perl module output +#--------------------------------------------------------------------------- +GENERATE_PERLMOD = NO +PERLMOD_LATEX = NO +PERLMOD_PRETTY = YES +PERLMOD_MAKEVAR_PREFIX = +#--------------------------------------------------------------------------- +# Configuration options related to the preprocessor +#--------------------------------------------------------------------------- +ENABLE_PREPROCESSING = YES +MACRO_EXPANSION = YES +EXPAND_ONLY_PREDEF = NO +SEARCH_INCLUDES = YES +INCLUDE_PATH = ../../source/lib/omnitrace-user +INCLUDE_FILE_PATTERNS = *.h \ + *.hpp +PREDEFINED = OMNITRACE_PUBLIC_API= \ + OMNITRACE_HIDDEN_API= \ + "OMNITRACE_ATTRIBUTE(...)=" \ + "OMNITRACE_VISIBILITY(...)=" \ + "__attribute__(x)=" \ + "__declspec(x)=" \ + "size_t=unsigned long" \ + "uintptr_t=unsigned long" \ + DOXYGEN_SHOULD_SKIP_THIS +EXPAND_AS_DEFINED = +SKIP_FUNCTION_MACROS = NO +#--------------------------------------------------------------------------- +# Configuration options related to external references +#--------------------------------------------------------------------------- +TAGFILES = +GENERATE_TAGFILE = html/tagfile.xml +ALLEXTERNALS = NO +EXTERNAL_GROUPS = YES +EXTERNAL_PAGES = YES +#--------------------------------------------------------------------------- +# Configuration options related to the dot tool +#--------------------------------------------------------------------------- +CLASS_DIAGRAMS = YES +DIA_PATH = +HIDE_UNDOC_RELATIONS = NO +HAVE_DOT = NO +DOT_NUM_THREADS = 0 +DOT_FONTNAME = Helvetica +DOT_FONTSIZE = 12 +DOT_FONTPATH = +CLASS_GRAPH = NO +COLLABORATION_GRAPH = YES +GROUP_GRAPHS = YES +UML_LOOK = YES +UML_LIMIT_NUM_FIELDS = 10 +TEMPLATE_RELATIONS = YES +INCLUDE_GRAPH = YES +INCLUDED_BY_GRAPH = YES +CALL_GRAPH = NO +CALLER_GRAPH = NO +GRAPHICAL_HIERARCHY = YES +DIRECTORY_GRAPH = YES +DOT_IMAGE_FORMAT = svg +INTERACTIVE_SVG = YES +DOT_PATH = /usr/bin/dot +DOTFILE_DIRS = +MSCFILE_DIRS = +DIAFILE_DIRS = +PLANTUML_JAR_PATH = +PLANTUML_CFG_FILE = +PLANTUML_INCLUDE_PATH = +DOT_GRAPH_MAX_NODES = 50 +MAX_DOT_GRAPH_DEPTH = 0 +DOT_TRANSPARENT = NO +DOT_MULTI_TARGETS = YES +GENERATE_LEGEND = YES +DOT_CLEANUP = YES diff --git a/docs/how-to/configuring-runtime-options.rst b/docs/how-to/configuring-runtime-options.rst new file mode 100644 index 000000000..16767087b --- /dev/null +++ b/docs/how-to/configuring-runtime-options.rst @@ -0,0 +1,1363 @@ +.. meta:: + :description: Omnitrace documentation and reference + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + +**************************************************** +Configuring runtime options +**************************************************** + +The ``omnitrace.cfg`` file maintains a list of the `Omnitrace `_ runtime options. To create this configuration +file and view the current runtime options, use the ``omnitrace-avail`` executable. + +The omnitrace-avail executable +======================================== + +The ``omnitrace-avail`` executable provides information about the runtime settings, +data collection capabilities, and, when built with PAPI support, the +available hardware counters. The executable is effectively +self-updating. As new capabilities and settings are added to the Omnitrace source code, they are +propagated to ``omnitrace-avail``. ``omnitrace-avail`` should be viewed as the ultimate authority +in the event of any conflicts with this documentation. + +It is recommended that you create a default configuration file in +``${HOME}/.omnitrace.cfg``. This can be done by +running the command ``omnitrace-avail -G ~/.omnitrace.cfg``. Alternatively, +use the ``omnitrace-avail -G ~/.omnitrace.cfg --all`` option +for a verbose configuration file with descriptions, categories, and additional information. + +Modify ``${HOME}/.omnitrace.cfg`` as required. For example, enable `Perfetto `_, +`Timemory `_, sampling, and process-level sampling by default +and tweak the default sampling values. + +.. code-block:: shell + + # ... + OMNITRACE_TRACE = true + OMNITRACE_PROFILE = true + OMNITRACE_USE_SAMPLING = true + OMNITRACE_USE_PROCESS_SAMPLING = true + # ... + OMNITRACE_SAMPLING_FREQ = 50 + OMNITRACE_SAMPLING_CPUS = all + OMNITRACE_SAMPLING_GPUS = $env:HIP_VISIBLE_DEVICES + +Exploring runtime settings +----------------------------------- + +Use the following command to view the list of the available runtime settings, their current values, and descriptions +for each setting: + +.. code-block:: shell + + omnitrace-avail --description + +.. note:: + + Use ``--brief`` to suppress printing the current value and/or ``-c 0`` to suppress truncation of the descriptions. + +Any Boolean setting (``omnitrace-avail --settings --value --brief --filter bool``) +accepts a case insensitive match for nearly all common Boolean logic expressions: +``ON``, ``OFF``, ``YES``, ``NO``, ``TRUE``, ``FALSE``, ``0``, ``1``, etc. + +Exploring components +----------------------------------- + +Omnitrace uses `Timemory `_ extensively to provide +various capabilities and manage +data and resources. By default, with ``OMNITRACE_PROFILE=ON``, Omnitrace only collects wall-clock +timing values. However, by modifying the ``OMNITRACE_TIMEMORY_COMPONENTS`` setting, +Omnitrace can be configured to +collect hardware counters, CPU-clock timers, memory usage, context switches, page faults, network statistics, +and much more. Omnitrace can even be used as a dynamic instrumentation vehicle +for other third-party profiling +APIs such as `Caliper `_ and `LIKWID `_. +To leverage this capability, build Omnitrace from source with the CMake +options ``TIMEMORY_USE_CALIPER=ON`` or ``TIMEMORY_USE_LIKWID=ON`` and then add +``caliper_marker``, ``likwid_marker``, or both to ``OMNITRACE_TIMEMORY_COMPONENTS``. + +To view all possible components and their descriptions: + +.. code-block:: shell + + omnitrace-avail --components --description + +To restrict the output to available components and view the string identifiers for ``OMNITRACE_TIMEMORY_COMPONENTS``: + +.. code-block:: shell + + omnitrace-avail --components --available --string --brief + +Exploring hardware counters +----------------------------------- + +Omnitrace supports hardware counter collection via PAPI and ROCm. +Generally, PAPI is used to collect CPU-based hardware counters and ROCm is used to collect GPU-based hardware +counters. Although it is possible to install PAPI with ROCm support and use it to +collect GPU-based hardware counters, this is not recommended because PAPI +cannot simultaneously collect CPU and GPU hardware counters. + +To view all possible hardware counters and their descriptions, use the following command: + +.. code-block:: shell + + omnitrace-avail --hw-counters --description + +Appending the ``-c CPU`` option restricts the list of hardware counters to +those available through PAPI, while ``-c GPU`` limits the list to those available from ROCm. + +Enabling hardware counters +----------------------------------- + +PAPI Hardware counters are configured with the ``OMNITRACE_PAPI_EVENTS`` configuration variable. +ROCm Hardware counters are configured with the ``OMNITRACE_ROCM_EVENTS`` configuration variable. +ROCm hardware counters also require the ``OMNITRACE_USE_ROCPROFILER`` configuration +variable to be enabled using ``OMNITRACE_USE_ROCPROFILER=ON``. + +Here is a sample configuration for hardware counters: + +.. code-block:: shell + + # using papi identifiers + OMNITRACE_PAPI_EVENTS = PAPI_TOT_CYC PAPI_TOT_INS + + # using perf identifiers + OMNITRACE_PAPI_EVENTS = perf::INSTRUCTIONS perf::CACHE-REFERENCES perf::CACHE-MISSES + +.. _omnitrace_papi_events: + +OMNITRACE_PAPI_EVENTS +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In order to collect the majority of hardware counters via PAPI, ensure the ``/proc/sys/kernel/perf_event_paranoid`` +has a value <= 2. If you have ``sudo`` access, use the following command to modify the value: + +.. code-block:: shell + + echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid + +However this value is not retained upon reboot. +Use the following command to preserve this setting after a reboot: + +.. code-block:: shell + + echo 'kernel.perf_event_paranoid=0' | sudo tee -a /etc/sysctl.conf + +PAPI events use a concept similar to a namespace. All specified hardware +counters must be from the same namespace. +For hardware counters starting with the ``PAPI_`` prefix, these are high-level +aggregates of multiple hardware counters. +Otherwise, most events use two or three colons (``::`` or ``:::``) between the +component name and the counter name, for example, +``amd64_rapl::RAPL_ENERGY_PKG`` and ``perf::PERF_COUNT_HW_CPU_CYCLES``. + +For example, the following is a valid configuration: + +.. code-block:: shell + + OMNITRACE_PAPI_EVENTS = perf::INSTRUCTIONS perf::CACHE-REFERENCES perf::CACHE-MISSES + +However, the following specification of a roughly equivalent set of hardware counters is an incorrect configuration because it mixes +PAPI components from different namespaces: + +.. code-block:: shell + + OMNITRACE_PAPI_EVENTS = PAPI_TOT_INS perf::CACHE-REFERENCES perf::CACHE-MISSES + +.. note:: + + If Omnitrace was configured with the default ``OMNITRACE_BUILD_PAPI=ON`` setting, + standard PAPI command-line tools such as + ``papi_avail`` and ``papi_event_chooser`` are not able to provide information + about the PAPI library used by Omnitrace + (because Omnitrace statically links to ``libpapi``). However, all of these tools are + installed with the prefix ``omnitrace-`` with + underscores replaced with hypens, for example ``papi_avail`` becomes ``omnitrace-papi-avail``. + +OMNITRACE_ROCM_EVENTS +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Omnitrace reads the ROCm events from the ``${ROCM_PATH}/lib/rocprofiler/metrics.xml`` +file. Use the ``ROCP_METRICS`` environment +variable to point Omnitrace to a different XML metrics file, for example, +``export ROCP_METRICS=${PWD}/custom_metrics.xml``. +``omnitrace-avail -H -c GPU`` shows event names with a suffix of ``:device=N`` +where ``N`` is the device number. +For example, if you have two devices, the output is: + +.. code-block:: shell + + | Wavefronts:device=0 | Derived counter: SQ_WAVES | + ... + | Wavefronts:device=1 | Derived counter: SQ_WAVES | + +To collect the event on all devices, specify the event, +such as ``Wavefronts``, without the ``:device=`` suffix. +To collect the event only on specific devices, use the ``:device=`` suffix. + +The following example: + +* Records the percentage of time the GPU was busy on all devices +* Counts the number of waves sent to SQs on device 0 +* Counts the number of VALU instructions issued on device 1 + +.. code-block:: shell + + OMNITRACE_ROCM_EVENTS = GPUBusy SQ_WAVES:device=0 SQ_INSTS_VALU:device=1 + +omnitrace-avail examples +----------------------------------- + +The following examples demonstrate how to use ``omnitrace-avail`` to perform several common +configuration tasks. + +Generating a default configuration file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: shell + + $ omnitrace-avail -G ~/.omnitrace.cfg + [omnitrace-avail] Outputting text configuration file '/home/user/.omnitrace.cfg'... + $ cat ~/.omnitrace.cfg + # auto-generated by omnitrace-avail (version 1.2.0) on 2022-06-27 @ 19:15 + + OMNITRACE_CONFIG_FILE = + OMNITRACE_MODE = trace + OMNITRACE_TRACE = true + OMNITRACE_PROFILE = false + OMNITRACE_USE_SAMPLING = false + OMNITRACE_USE_PROCESS_SAMPLING = true + OMNITRACE_USE_ROCTRACER = true + OMNITRACE_USE_ROCM_SMI = true + OMNITRACE_USE_KOKKOSP = false + OMNITRACE_USE_CODE_COVERAGE = false + OMNITRACE_USE_PID = true + OMNITRACE_OUTPUT_PATH = omnitrace-%tag%-output + OMNITRACE_OUTPUT_PREFIX = + OMNITRACE_CI = false + OMNITRACE_THREAD_POOL_SIZE = 8 + OMNITRACE_DEBUG = false + OMNITRACE_DL_VERBOSE = 0 + OMNITRACE_INSTRUMENTATION_INTERVAL = 1 + OMNITRACE_KOKKOSP_KERNEL_LOGGER = false + OMNITRACE_PAPI_EVENTS = PAPI_TOT_CYC + OMNITRACE_PERFETTO_BACKEND = inprocess + OMNITRACE_PERFETTO_BUFFER_SIZE_KB = 1024000 + OMNITRACE_PERFETTO_COMBINE_TRACES = false + OMNITRACE_PERFETTO_FILE = perfetto-trace.proto + OMNITRACE_PERFETTO_FILL_POLICY = discard + OMNITRACE_PERFETTO_SHMEM_SIZE_HINT_KB = 4096 + OMNITRACE_ROCTRACER_HSA_ACTIVITY = false + OMNITRACE_ROCTRACER_HSA_API = false + OMNITRACE_ROCTRACER_HSA_API_TYPES = + OMNITRACE_SAMPLING_CPUS = + OMNITRACE_SAMPLING_DELAY = 0.5 + OMNITRACE_SAMPLING_FREQ = 10 + OMNITRACE_SAMPLING_GPUS = all + OMNITRACE_TIME_OUTPUT = true + OMNITRACE_TIMEMORY_COMPONENTS = wall_clock + OMNITRACE_TRACE_THREAD_LOCKS = false + OMNITRACE_VERBOSE = 0 + OMNITRACE_COLLAPSE_PROCESSES = false + OMNITRACE_COLLAPSE_THREADS = false + OMNITRACE_COUT_OUTPUT = false + OMNITRACE_CPU_AFFINITY = false + OMNITRACE_DIFF_OUTPUT = false + OMNITRACE_ENABLE_SIGNAL_HANDLER = true + OMNITRACE_ENABLED = true + OMNITRACE_FILE_OUTPUT = true + OMNITRACE_FLAT_PROFILE = false + OMNITRACE_INPUT_EXTENSIONS = json,xml + OMNITRACE_INPUT_PATH = + OMNITRACE_INPUT_PREFIX = + OMNITRACE_JSON_OUTPUT = true + OMNITRACE_MAX_DEPTH = 65535 + OMNITRACE_MAX_WIDTH = 120 + OMNITRACE_MEMORY_PRECISION = -1 + OMNITRACE_MEMORY_SCIENTIFIC = false + OMNITRACE_MEMORY_UNITS = MB + OMNITRACE_MEMORY_WIDTH = -1 + OMNITRACE_NETWORK_INTERFACE = + OMNITRACE_NODE_COUNT = 0 + OMNITRACE_PAPI_FAIL_ON_ERROR = false + OMNITRACE_PAPI_MULTIPLEXING = false + OMNITRACE_PAPI_OVERFLOW = 0 + OMNITRACE_PAPI_QUIET = false + OMNITRACE_PAPI_THREADING = true + OMNITRACE_PRECISION = -1 + OMNITRACE_SCIENTIFIC = false + OMNITRACE_STRICT_CONFIG = true + OMNITRACE_SUPPRESS_CONFIG = true + OMNITRACE_SUPPRESS_PARSING = true + OMNITRACE_TEXT_OUTPUT = true + OMNITRACE_TIME_FORMAT = %F_%H.%M + OMNITRACE_TIMELINE_PROFILE = false + OMNITRACE_TIMING_PRECISION = 6 + OMNITRACE_TIMING_SCIENTIFIC = false + OMNITRACE_TIMING_UNITS = sec + OMNITRACE_TIMING_WIDTH = -1 + OMNITRACE_TREE_OUTPUT = true + OMNITRACE_WIDTH = -1 + +When creating a new configuration file, the following recommendations apply: + +* Use the ``--all`` option to view all descriptions, choices, and other information in the configuration file. +* To create a new configuration without inheriting from an existing ``${HOME}/.omnitrace.cfg`` file, + set ``OMNITRACE_SUPPRESS_CONFIG=ON`` in the environment beforehand. +* To create a new configuration that makes minor changes to an existing configuration, + set ``OMNITRACE_CONFIG_FILE=/path/to/existing/file`` and define the changes as environment + variables before generating it. + +Viewing the setting descriptions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: shell + + $ omnitrace-avail -S -bd + |-----------------------------------------|-----------------------------------------| + | ENVIRONMENT VARIABLE | DESCRIPTION | + |-----------------------------------------|-----------------------------------------| + | OMNITRACE_CI | Enable some runtime validation check... | + | OMNITRACE_ADD_SECONDARY | Enable/disable components adding sec... | + | OMNITRACE_COLLAPSE_PROCESSES | Enable/disable combining process-spe... | + | OMNITRACE_COLLAPSE_THREADS | Enable/disable combining thread-spec... | + | OMNITRACE_CONFIG_FILE | Configuration file for omnitrace | + | OMNITRACE_COUT_OUTPUT | Write output to stdout | + | OMNITRACE_CPU_AFFINITY | Enable pinning threads to CPUs (Linu... | + | OMNITRACE_THREAD_POOL_SIZE | Number of threads to use when genera... | + | OMNITRACE_DEBUG | Enable debug output | + | OMNITRACE_DIFF_OUTPUT | Generate a difference output vs. a p... | + | OMNITRACE_DL_VERBOSE | Verbosity within the omnitrace-dl li... | + | OMNITRACE_ENABLED | Activation state of timemory | + | OMNITRACE_ENABLE_SIGNAL_HANDLER | Enable signals in timemory_init | + | OMNITRACE_FILE_OUTPUT | Write output to files | + | OMNITRACE_FLAT_PROFILE | Set the label hierarchy mode to defa... | + | OMNITRACE_INPUT_EXTENSIONS | File extensions used when searching ... | + | OMNITRACE_INPUT_PATH | Explicitly specify the input folder ... | + | OMNITRACE_INPUT_PREFIX | Explicitly specify the prefix for in... | + | OMNITRACE_INSTRUMENTATION_INTERVAL | Instrumentation only takes measureme... | + | OMNITRACE_JSON_OUTPUT | Write json output files | + | OMNITRACE_KOKKOSP_KERNEL_LOGGER | Enables kernel logging | + | OMNITRACE_MAX_DEPTH | Set the maximum depth of label hiera... | + | OMNITRACE_MAX_THREAD_BOOKMARKS | Maximum number of times a worker thr... | + | OMNITRACE_MAX_WIDTH | Set the maximum width for component ... | + | OMNITRACE_MEMORY_PRECISION | Set the precision for components wit... | + | OMNITRACE_MEMORY_SCIENTIFIC | Set the numerical reporting format f... | + | OMNITRACE_MEMORY_UNITS | Set the units for components with u... | + | OMNITRACE_MEMORY_WIDTH | Set the output width for components ... | + | OMNITRACE_NETWORK_INTERFACE | Default network interface | + | OMNITRACE_NODE_COUNT | Total number of nodes used in applic... | + | OMNITRACE_OUTPUT_FILE | Perfetto filename | + | OMNITRACE_OUTPUT_PATH | Explicitly specify the output folder... | + | OMNITRACE_OUTPUT_PREFIX | Explicitly specify a prefix for all ... | + | OMNITRACE_PAPI_EVENTS | PAPI presets and events to collect (... | + | OMNITRACE_PAPI_FAIL_ON_ERROR | Configure PAPI errors to trigger a r... | + | OMNITRACE_PAPI_MULTIPLEXING | Enable multiplexing when using PAPI | + | OMNITRACE_PAPI_OVERFLOW | Value at which PAPI hw counters trig... | + | OMNITRACE_PAPI_QUIET | Configure suppression of reporting P... | + | OMNITRACE_PAPI_THREADING | Enable multithreading support when u... | + | OMNITRACE_PERFETTO_BACKEND | Specify the perfetto backend to acti... | + | OMNITRACE_PERFETTO_BUFFER_SIZE_KB | Size of perfetto buffer (in KB) | + | OMNITRACE_PERFETTO_COMBINE_TRACES | Combine Perfetto traces. If not expl... | + | OMNITRACE_PERFETTO_FILL_POLICY | Behavior when perfetto buffer is ful... | + | OMNITRACE_PERFETTO_SHMEM_SIZE_HINT_KB | Hint for shared-memory buffer size i... | + | OMNITRACE_PRECISION | Set the global output precision for ... | + | OMNITRACE_ROCTRACER_HSA_ACTIVITY | Enable HSA activity tracing support | + | OMNITRACE_ROCTRACER_HSA_API | Enable HSA API tracing support | + | OMNITRACE_ROCTRACER_HSA_API_TYPES | HSA API type to collect | + | OMNITRACE_SAMPLING_CPUS | CPUs to collect frequency informatio... | + | OMNITRACE_SAMPLING_DELAY | Number of seconds to wait before the... | + | OMNITRACE_SAMPLING_FREQ | Number of software interrupts per se... | + | OMNITRACE_SAMPLING_GPUS | Devices to query when OMNITRACE_USE_... | + | OMNITRACE_SCIENTIFIC | Set the global numerical reporting t... | + | OMNITRACE_STRICT_CONFIG | Throw errors for unknown setting nam... | + | OMNITRACE_SUPPRESS_CONFIG | Disable processing of setting config... | + | OMNITRACE_SUPPRESS_PARSING | Disable parsing environment | + | OMNITRACE_TEXT_OUTPUT | Write text output files | + | OMNITRACE_TIMELINE_PROFILE | Set the label hierarchy mode to defa... | + | OMNITRACE_TIMEMORY_COMPONENTS | List of components to collect via ti... | + | OMNITRACE_TIME_FORMAT | Customize the folder generation when... | + | OMNITRACE_TIME_OUTPUT | Output data to subfolder w/ a timest... | + | OMNITRACE_TIMING_PRECISION | Set the precision for components wit... | + | OMNITRACE_TIMING_SCIENTIFIC | Set the numerical reporting format f... | + | OMNITRACE_TIMING_UNITS | Set the units for components with u... | + | OMNITRACE_TIMING_WIDTH | Set the output width for components ... | + | OMNITRACE_TRACE_THREAD_LOCKS | Enable tracking calls to pthread_mut... | + | OMNITRACE_TREE_OUTPUT | Write hierarchical json output files | + | OMNITRACE_USE_CODE_COVERAGE | Enable support for code coverage | + | OMNITRACE_USE_KOKKOSP | Enable support for Kokkos Tools | + | OMNITRACE_USE_OMPT | Enable support for OpenMP-Tools | + | OMNITRACE_TRACE | Enable perfetto backend | + | OMNITRACE_USE_PID | Enable tagging filenames with proces... | + | OMNITRACE_USE_ROCM_SMI | Enable sampling GPU power, temp, uti... | + | OMNITRACE_USE_ROCTRACER | Enable ROCM tracing | + | OMNITRACE_USE_SAMPLING | Enable statistical sampling of call-... | + | OMNITRACE_USE_PROCESS_SAMPLING | Enable a background thread which sam... | + | OMNITRACE_PROFILE | Enable timemory backend | + | OMNITRACE_VERBOSE | Verbosity level | + | OMNITRACE_WIDTH | Set the global output width for comp... | + |-----------------------------------------|-----------------------------------------| + +Viewing components +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: shell + + $ omnitrace-avail -C -bd + |-----------------------------------|----------------------------------------------| + | COMPONENT | DESCRIPTION | + |-----------------------------------|----------------------------------------------| + | allinea_map | Controls the AllineaMAP sampler. | + | caliper_marker | Generic forwarding of markers to Caliper ... | + | caliper_config | Caliper configuration manager. | + | caliper_loop_marker | Variant of caliper_marker with support fo... | + | cpu_clock | Total CPU time spent in both user- and ke... | + | cpu_util | Percentage of CPU-clock time divided by w... | + | craypat_counters | Names and value of any counter events tha... | + | craypat_flush_buffer | Writes all the recorded contents in the d... | + | craypat_heap_stats | Undocumented by 'pat_api.h'. | + | craypat_record | Toggles CrayPAT recording on calling thread. | + | craypat_region | Adds region labels to CrayPAT output. | + | current_peak_rss | Absolute value of high-water mark of memo... | + | gperftools_cpu_profiler | Control switch for gperftools CPU profiler. | + | gperftools_heap_profiler | Control switch for the gperftools heap pr... | + | hip_event | Records the time interval between two poi... | + | kernel_mode_time | CPU time spent executing in kernel mode (... | + | likwid_marker | LIKWID perfmon (CPU) marker forwarding. | + | likwid_nvmarker | LIKWID nvmon (GPU) marker forwarding. | + | malloc_gotcha | GOTCHA wrapper for memory allocation func... | + | memory_allocations | Number of bytes allocated/freed instead o... | + | monotonic_clock | Wall-clock timer which will continue to i... | + | monotonic_raw_clock | Wall-clock timer unaffected by frequency ... | + | network_stats | Reports network bytes, packets, errors, d... | + | num_io_in | Number of times the filesystem had to per... | + | num_io_out | Number of times the filesystem had to per... | + | num_major_page_faults | Number of page faults serviced that requi... | + | num_minor_page_faults | Number of page faults serviced without an... | + | page_rss | Amount of memory allocated in pages of me... | + | papi_array<8ul> | Fixed-size array of PAPI HW counters. | + | papi_vector | Dynamically allocated array of PAPI HW co... | + | peak_rss | Measures changes in the high-water mark f... | + | perfetto_trace | Provides Perfetto Tracing SDK: system pro... | + | priority_context_switch | Number of context switch due to higher pr... | + | process_cpu_clock | CPU-clock timer for the calling process (... | + | process_cpu_util | Percentage of CPU-clock time divided by w... | + | read_bytes | Number of bytes which this process really... | + | read_char | Number of bytes which this task has cause... | + | roctx_marker | Generates high-level region markers for H... | + | system_clock | CPU time spent in kernel-mode. | + | tau_marker | Forwards markers to TAU instrumentation (... | + | thread_cpu_clock | CPU-clock timer for the calling thread. | + | thread_cpu_util | Percentage of CPU-clock time divided by w... | + | timestamp | Provides a timestamp for every sample and... | + | trip_count | Counts number of invocations. | + | user_clock | CPU time spent in user-mode. | + | user_mode_time | CPU time spent executing in user mode (vi... | + | virtual_memory | Records the change in virtual memory. | + | voluntary_context_switch | Number of context switches due to a proce... | + | vtune_event | Creates events for Intel profiler running... | + | vtune_frame | Creates frames for Intel profiler running... | + | vtune_profiler | Control switch for Intel profiler running... | + | wall_clock | Real-clock timer (i.e. wall-clock timer). | + | written_bytes | Number of bytes sent to the storage layer. | + | written_char | Number of bytes which this task has cause... | + | omnitrace | Invokes instrumentation functions omnitr... | + | roctracer | High-precision ROCm API and kernel tracing. | + | sampling_wall_clock | Wall-clock timing. Derived from statistic... | + | sampling_cpu_clock | CPU-clock timing. Derived from statistica... | + | sampling_percent | Fraction of wall-clock time spent in func... | + | sampling_gpu_power | GPU Power Usage via ROCm-SMI. Derived fro... | + | sampling_gpu_temp | GPU Temperature via ROCm-SMI. Derived fro... | + | sampling_gpu_busy | GPU Utilization (% busy) via ROCm-SMI. De... | + | sampling_gpu_memory_usage | GPU Memory Usage via ROCm-SMI. Derived fr... | + |-----------------------------------|----------------------------------------------| + +Viewing hardware counters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: shell + + $ omnitrace-avail -H -bd + |---------------------------------------|---------------------------------------| + | HARDWARE COUNTER | DESCRIPTION | + |---------------------------------------|---------------------------------------| + | CPU | | + |---------------------------------------|---------------------------------------| + | PAPI_L1_DCM | Level 1 data cache misses | + | PAPI_L1_ICM | Level 1 instruction cache misses | + | PAPI_L2_DCM | Level 2 data cache misses | + | PAPI_L2_ICM | Level 2 instruction cache misses | + | PAPI_L3_DCM | Level 3 data cache misses | + | PAPI_L3_ICM | Level 3 instruction cache misses | + | PAPI_L1_TCM | Level 1 cache misses | + | PAPI_L2_TCM | Level 2 cache misses | + | PAPI_L3_TCM | Level 3 cache misses | + | PAPI_CA_SNP | Requests for a snoop | + | PAPI_CA_SHR | Requests for exclusive access to s... | + | PAPI_CA_CLN | Requests for exclusive access to c... | + | PAPI_CA_INV | Requests for cache line invalidation | + | PAPI_CA_ITV | Requests for cache line intervention | + | PAPI_L3_LDM | Level 3 load misses | + | PAPI_L3_STM | Level 3 store misses | + | PAPI_BRU_IDL | Cycles branch units are idle | + | PAPI_FXU_IDL | Cycles integer units are idle | + | PAPI_FPU_IDL | Cycles floating point units are idle | + | PAPI_LSU_IDL | Cycles load/store units are idle | + | PAPI_TLB_DM | Data translation lookaside buffer ... | + | PAPI_TLB_IM | Instruction translation lookaside ... | + | PAPI_TLB_TL | Total translation lookaside buffer... | + | PAPI_L1_LDM | Level 1 load misses | + | PAPI_L1_STM | Level 1 store misses | + | PAPI_L2_LDM | Level 2 load misses | + | PAPI_L2_STM | Level 2 store misses | + | PAPI_BTAC_M | Branch target address cache misses | + | PAPI_PRF_DM | Data prefetch cache misses | + | PAPI_L3_DCH | Level 3 data cache hits | + | PAPI_TLB_SD | Translation lookaside buffer shoot... | + | PAPI_CSR_FAL | Failed store conditional instructions | + | PAPI_CSR_SUC | Successful store conditional instr... | + | PAPI_CSR_TOT | Total store conditional instructions | + | PAPI_MEM_SCY | Cycles Stalled Waiting for memory ... | + | PAPI_MEM_RCY | Cycles Stalled Waiting for memory ... | + | PAPI_MEM_WCY | Cycles Stalled Waiting for memory ... | + | PAPI_STL_ICY | Cycles with no instruction issue | + | PAPI_FUL_ICY | Cycles with maximum instruction issue | + | PAPI_STL_CCY | Cycles with no instructions completed | + | PAPI_FUL_CCY | Cycles with maximum instructions c... | + | PAPI_HW_INT | Hardware interrupts | + | PAPI_BR_UCN | Unconditional branch instructions | + | PAPI_BR_CN | Conditional branch instructions | + | PAPI_BR_TKN | Conditional branch instructions taken | + | PAPI_BR_NTK | Conditional branch instructions no... | + | PAPI_BR_MSP | Conditional branch instructions mi... | + | PAPI_BR_PRC | Conditional branch instructions co... | + | PAPI_FMA_INS | FMA instructions completed | + | PAPI_TOT_IIS | Instructions issued | + | PAPI_TOT_INS | Instructions completed | + | PAPI_INT_INS | Integer instructions | + | PAPI_FP_INS | Floating point instructions | + | PAPI_LD_INS | Load instructions | + | PAPI_SR_INS | Store instructions | + | PAPI_BR_INS | Branch instructions | + | PAPI_VEC_INS | Vector/SIMD instructions (could in... | + | PAPI_RES_STL | Cycles stalled on any resource | + | PAPI_FP_STAL | Cycles the FP unit(s) are stalled | + | PAPI_TOT_CYC | Total cycles | + | PAPI_LST_INS | Load/store instructions completed | + | PAPI_SYC_INS | Synchronization instructions compl... | + | PAPI_L1_DCH | Level 1 data cache hits | + | PAPI_L2_DCH | Level 2 data cache hits | + | PAPI_L1_DCA | Level 1 data cache accesses | + | PAPI_L2_DCA | Level 2 data cache accesses | + | PAPI_L3_DCA | Level 3 data cache accesses | + | PAPI_L1_DCR | Level 1 data cache reads | + | PAPI_L2_DCR | Level 2 data cache reads | + | PAPI_L3_DCR | Level 3 data cache reads | + | PAPI_L1_DCW | Level 1 data cache writes | + | PAPI_L2_DCW | Level 2 data cache writes | + | PAPI_L3_DCW | Level 3 data cache writes | + | PAPI_L1_ICH | Level 1 instruction cache hits | + | PAPI_L2_ICH | Level 2 instruction cache hits | + | PAPI_L3_ICH | Level 3 instruction cache hits | + | PAPI_L1_ICA | Level 1 instruction cache accesses | + | PAPI_L2_ICA | Level 2 instruction cache accesses | + | PAPI_L3_ICA | Level 3 instruction cache accesses | + | PAPI_L1_ICR | Level 1 instruction cache reads | + | PAPI_L2_ICR | Level 2 instruction cache reads | + | PAPI_L3_ICR | Level 3 instruction cache reads | + | PAPI_L1_ICW | Level 1 instruction cache writes | + | PAPI_L2_ICW | Level 2 instruction cache writes | + | PAPI_L3_ICW | Level 3 instruction cache writes | + | PAPI_L1_TCH | Level 1 total cache hits | + | PAPI_L2_TCH | Level 2 total cache hits | + | PAPI_L3_TCH | Level 3 total cache hits | + | PAPI_L1_TCA | Level 1 total cache accesses | + | PAPI_L2_TCA | Level 2 total cache accesses | + | PAPI_L3_TCA | Level 3 total cache accesses | + | PAPI_L1_TCR | Level 1 total cache reads | + | PAPI_L2_TCR | Level 2 total cache reads | + | PAPI_L3_TCR | Level 3 total cache reads | + | PAPI_L1_TCW | Level 1 total cache writes | + | PAPI_L2_TCW | Level 2 total cache writes | + | PAPI_L3_TCW | Level 3 total cache writes | + | PAPI_FML_INS | Floating point multiply instructions | + | PAPI_FAD_INS | Floating point add instructions | + | PAPI_FDV_INS | Floating point divide instructions | + | PAPI_FSQ_INS | Floating point square root instruc... | + | PAPI_FNV_INS | Floating point inverse instructions | + | PAPI_FP_OPS | Floating point operations | + | PAPI_SP_OPS | Floating point operations; optimiz... | + | PAPI_DP_OPS | Floating point operations; optimiz... | + | PAPI_VEC_SP | Single precision vector/SIMD instr... | + | PAPI_VEC_DP | Double precision vector/SIMD instr... | + | PAPI_REF_CYC | Reference clock cycles | + | perf::PERF_COUNT_HW_CPU_CYCLES | PERF_COUNT_HW_CPU_CYCLES | + | perf::PERF_COUNT_HW_CPU_CYCLES:u=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... | + | perf::PERF_COUNT_HW_CPU_CYCLES:k=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... | + | perf::PERF_COUNT_HW_CPU_CYCLES:h=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... | + | perf::PERF_COUNT_HW_CPU_CYCLES:per... | perf::PERF_COUNT_HW_CPU_CYCLES + s... | + | perf::PERF_COUNT_HW_CPU_CYCLES:freq=0 | perf::PERF_COUNT_HW_CPU_CYCLES + s... | + | perf::PERF_COUNT_HW_CPU_CYCLES:pre... | perf::PERF_COUNT_HW_CPU_CYCLES + p... | + | perf::PERF_COUNT_HW_CPU_CYCLES:excl=0 | perf::PERF_COUNT_HW_CPU_CYCLES + e... | + | perf::PERF_COUNT_HW_CPU_CYCLES:mg=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... | + | perf::PERF_COUNT_HW_CPU_CYCLES:mh=0 | perf::PERF_COUNT_HW_CPU_CYCLES + m... | + | perf::PERF_COUNT_HW_CPU_CYCLES:cpu=0 | perf::PERF_COUNT_HW_CPU_CYCLES + C... | + | perf::PERF_COUNT_HW_CPU_CYCLES:pin... | perf::PERF_COUNT_HW_CPU_CYCLES + p... | + | perf::CYCLES | PERF_COUNT_HW_CPU_CYCLES | + | perf::CYCLES:u=0 | perf::CYCLES + monitor at user level | + | perf::CYCLES:k=0 | perf::CYCLES + monitor at kernel l... | + | perf::CYCLES:h=0 | perf::CYCLES + monitor at hypervis... | + | perf::CYCLES:period=0 | perf::CYCLES + sampling period | + | perf::CYCLES:freq=0 | perf::CYCLES + sampling frequency ... | + | perf::CYCLES:precise=0 | perf::CYCLES + precise event sampling | + | perf::CYCLES:excl=0 | perf::CYCLES + exclusive access | + | perf::CYCLES:mg=0 | perf::CYCLES + monitor guest execu... | + | perf::CYCLES:mh=0 | perf::CYCLES + monitor host execution | + | perf::CYCLES:cpu=0 | perf::CYCLES + CPU to program | + | perf::CYCLES:pinned=0 | perf::CYCLES + pin event to counters | + | perf::CPU-CYCLES | PERF_COUNT_HW_CPU_CYCLES | + | perf::CPU-CYCLES:u=0 | perf::CPU-CYCLES + monitor at user... | + | perf::CPU-CYCLES:k=0 | perf::CPU-CYCLES + monitor at kern... | + | perf::CPU-CYCLES:h=0 | perf::CPU-CYCLES + monitor at hype... | + | perf::CPU-CYCLES:period=0 | perf::CPU-CYCLES + sampling period | + | perf::CPU-CYCLES:freq=0 | perf::CPU-CYCLES + sampling freque... | + | perf::CPU-CYCLES:precise=0 | perf::CPU-CYCLES + precise event s... | + | perf::CPU-CYCLES:excl=0 | perf::CPU-CYCLES + exclusive access | + | perf::CPU-CYCLES:mg=0 | perf::CPU-CYCLES + monitor guest e... | + | perf::CPU-CYCLES:mh=0 | perf::CPU-CYCLES + monitor host ex... | + | perf::CPU-CYCLES:cpu=0 | perf::CPU-CYCLES + CPU to program | + | perf::CPU-CYCLES:pinned=0 | perf::CPU-CYCLES + pin event to co... | + | perf::PERF_COUNT_HW_INSTRUCTIONS | PERF_COUNT_HW_INSTRUCTIONS | + | perf::PERF_COUNT_HW_INSTRUCTIONS:u=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:k=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:h=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:p... | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:f... | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:p... | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:e... | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:mg=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:mh=0 | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:c... | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | perf::PERF_COUNT_HW_INSTRUCTIONS:p... | perf::PERF_COUNT_HW_INSTRUCTIONS +... | + | ... etc. ... | | + | perf_raw::r0000 | perf_events raw event syntax: r[0-... | + | perf_raw::r0000:u=0 | perf_raw::r0000 + monitor at user ... | + | perf_raw::r0000:k=0 | perf_raw::r0000 + monitor at kerne... | + | perf_raw::r0000:h=0 | perf_raw::r0000 + monitor at hyper... | + | perf_raw::r0000:period=0 | perf_raw::r0000 + sampling period | + | perf_raw::r0000:freq=0 | perf_raw::r0000 + sampling frequen... | + | perf_raw::r0000:precise=0 | perf_raw::r0000 + precise event sa... | + | perf_raw::r0000:excl=0 | perf_raw::r0000 + exclusive access | + | perf_raw::r0000:mg=0 | perf_raw::r0000 + monitor guest ex... | + | perf_raw::r0000:mh=0 | perf_raw::r0000 + monitor host exe... | + | perf_raw::r0000:cpu=0 | perf_raw::r0000 + CPU to program | + | perf_raw::r0000:pinned=0 | perf_raw::r0000 + pin event to cou... | + | perf_raw::r0000:hw_smpl=0 | perf_raw::r0000 + enable hardware ... | + | L1_ITLB_MISS_L2_ITLB_HIT | Number of instruction fetches that... | + | L1_ITLB_MISS_L2_ITLB_HIT:e=0 | L1_ITLB_MISS_L2_ITLB_HIT + edge level | + | L1_ITLB_MISS_L2_ITLB_HIT:i=0 | L1_ITLB_MISS_L2_ITLB_HIT + invert | + | L1_ITLB_MISS_L2_ITLB_HIT:c=0 | L1_ITLB_MISS_L2_ITLB_HIT + counter... | + | L1_ITLB_MISS_L2_ITLB_HIT:g=0 | L1_ITLB_MISS_L2_ITLB_HIT + measure... | + | L1_ITLB_MISS_L2_ITLB_HIT:u=0 | L1_ITLB_MISS_L2_ITLB_HIT + monitor... | + | L1_ITLB_MISS_L2_ITLB_HIT:k=0 | L1_ITLB_MISS_L2_ITLB_HIT + monitor... | + | L1_ITLB_MISS_L2_ITLB_HIT:period=0 | L1_ITLB_MISS_L2_ITLB_HIT + samplin... | + | L1_ITLB_MISS_L2_ITLB_HIT:freq=0 | L1_ITLB_MISS_L2_ITLB_HIT + samplin... | + | L1_ITLB_MISS_L2_ITLB_HIT:excl=0 | L1_ITLB_MISS_L2_ITLB_HIT + exclusi... | + | L1_ITLB_MISS_L2_ITLB_HIT:mg=0 | L1_ITLB_MISS_L2_ITLB_HIT + monitor... | + | L1_ITLB_MISS_L2_ITLB_HIT:mh=0 | L1_ITLB_MISS_L2_ITLB_HIT + monitor... | + | L1_ITLB_MISS_L2_ITLB_HIT:cpu=0 | L1_ITLB_MISS_L2_ITLB_HIT + CPU to ... | + | L1_ITLB_MISS_L2_ITLB_HIT:pinned=0 | L1_ITLB_MISS_L2_ITLB_HIT + pin eve... | + | L1_ITLB_MISS_L2_ITLB_MISS | Number of instruction fetches that... | + | L1_ITLB_MISS_L2_ITLB_MISS:IF1G | L1_ITLB_MISS_L2_ITLB_MISS + Number... | + | L1_ITLB_MISS_L2_ITLB_MISS:IF2M | L1_ITLB_MISS_L2_ITLB_MISS + Number... | + | L1_ITLB_MISS_L2_ITLB_MISS:IF4K | L1_ITLB_MISS_L2_ITLB_MISS + Number... | + | L1_ITLB_MISS_L2_ITLB_MISS:e=0 | L1_ITLB_MISS_L2_ITLB_MISS + edge l... | + | L1_ITLB_MISS_L2_ITLB_MISS:i=0 | L1_ITLB_MISS_L2_ITLB_MISS + invert | + | L1_ITLB_MISS_L2_ITLB_MISS:c=0 | L1_ITLB_MISS_L2_ITLB_MISS + counte... | + | L1_ITLB_MISS_L2_ITLB_MISS:g=0 | L1_ITLB_MISS_L2_ITLB_MISS + measur... | + | L1_ITLB_MISS_L2_ITLB_MISS:u=0 | L1_ITLB_MISS_L2_ITLB_MISS + monito... | + | L1_ITLB_MISS_L2_ITLB_MISS:k=0 | L1_ITLB_MISS_L2_ITLB_MISS + monito... | + | L1_ITLB_MISS_L2_ITLB_MISS:period=0 | L1_ITLB_MISS_L2_ITLB_MISS + sampli... | + | L1_ITLB_MISS_L2_ITLB_MISS:freq=0 | L1_ITLB_MISS_L2_ITLB_MISS + sampli... | + | L1_ITLB_MISS_L2_ITLB_MISS:excl=0 | L1_ITLB_MISS_L2_ITLB_MISS + exclus... | + | L1_ITLB_MISS_L2_ITLB_MISS:mg=0 | L1_ITLB_MISS_L2_ITLB_MISS + monito... | + | L1_ITLB_MISS_L2_ITLB_MISS:mh=0 | L1_ITLB_MISS_L2_ITLB_MISS + monito... | + | L1_ITLB_MISS_L2_ITLB_MISS:cpu=0 | L1_ITLB_MISS_L2_ITLB_MISS + CPU to... | + | L1_ITLB_MISS_L2_ITLB_MISS:pinned=0 | L1_ITLB_MISS_L2_ITLB_MISS + pin ev... | + | RETIRED_SSE_AVX_FLOPS | This is a retire-based event. The ... | + | RETIRED_SSE_AVX_FLOPS:ADD_SUB_FLOPS | RETIRED_SSE_AVX_FLOPS + Addition/s... | + | RETIRED_SSE_AVX_FLOPS:MULT_FLOPS | RETIRED_SSE_AVX_FLOPS + Multiplica... | + | RETIRED_SSE_AVX_FLOPS:DIV_FLOPS | RETIRED_SSE_AVX_FLOPS + Division F... | + | RETIRED_SSE_AVX_FLOPS:MAC_FLOPS | RETIRED_SSE_AVX_FLOPS + Double pre... | + | RETIRED_SSE_AVX_FLOPS:ANY | RETIRED_SSE_AVX_FLOPS + Double pre... | + | RETIRED_SSE_AVX_FLOPS:e=0 | RETIRED_SSE_AVX_FLOPS + edge level | + | RETIRED_SSE_AVX_FLOPS:i=0 | RETIRED_SSE_AVX_FLOPS + invert | + | RETIRED_SSE_AVX_FLOPS:c=0 | RETIRED_SSE_AVX_FLOPS + counter-ma... | + | RETIRED_SSE_AVX_FLOPS:g=0 | RETIRED_SSE_AVX_FLOPS + measure in... | + | RETIRED_SSE_AVX_FLOPS:u=0 | RETIRED_SSE_AVX_FLOPS + monitor at... | + | RETIRED_SSE_AVX_FLOPS:k=0 | RETIRED_SSE_AVX_FLOPS + monitor at... | + | RETIRED_SSE_AVX_FLOPS:period=0 | RETIRED_SSE_AVX_FLOPS + sampling p... | + | RETIRED_SSE_AVX_FLOPS:freq=0 | RETIRED_SSE_AVX_FLOPS + sampling f... | + | RETIRED_SSE_AVX_FLOPS:excl=0 | RETIRED_SSE_AVX_FLOPS + exclusive ... | + | RETIRED_SSE_AVX_FLOPS:mg=0 | RETIRED_SSE_AVX_FLOPS + monitor gu... | + | RETIRED_SSE_AVX_FLOPS:mh=0 | RETIRED_SSE_AVX_FLOPS + monitor ho... | + | RETIRED_SSE_AVX_FLOPS:cpu=0 | RETIRED_SSE_AVX_FLOPS + CPU to pro... | + | RETIRED_SSE_AVX_FLOPS:pinned=0 | RETIRED_SSE_AVX_FLOPS + pin event ... | + | DIV_CYCLES_BUSY_COUNT | Number of cycles when the divider ... | + | DIV_CYCLES_BUSY_COUNT:e=0 | DIV_CYCLES_BUSY_COUNT + edge level | + | DIV_CYCLES_BUSY_COUNT:i=0 | DIV_CYCLES_BUSY_COUNT + invert | + | DIV_CYCLES_BUSY_COUNT:c=0 | DIV_CYCLES_BUSY_COUNT + counter-ma... | + | DIV_CYCLES_BUSY_COUNT:g=0 | DIV_CYCLES_BUSY_COUNT + measure in... | + | DIV_CYCLES_BUSY_COUNT:u=0 | DIV_CYCLES_BUSY_COUNT + monitor at... | + | DIV_CYCLES_BUSY_COUNT:k=0 | DIV_CYCLES_BUSY_COUNT + monitor at... | + | DIV_CYCLES_BUSY_COUNT:period=0 | DIV_CYCLES_BUSY_COUNT + sampling p... | + | DIV_CYCLES_BUSY_COUNT:freq=0 | DIV_CYCLES_BUSY_COUNT + sampling f... | + | DIV_CYCLES_BUSY_COUNT:excl=0 | DIV_CYCLES_BUSY_COUNT + exclusive ... | + | DIV_CYCLES_BUSY_COUNT:mg=0 | DIV_CYCLES_BUSY_COUNT + monitor gu... | + | DIV_CYCLES_BUSY_COUNT:mh=0 | DIV_CYCLES_BUSY_COUNT + monitor ho... | + | DIV_CYCLES_BUSY_COUNT:cpu=0 | DIV_CYCLES_BUSY_COUNT + CPU to pro... | + | DIV_CYCLES_BUSY_COUNT:pinned=0 | DIV_CYCLES_BUSY_COUNT + pin event ... | + | DIV_OP_COUNT | Number of divide uops. | + | DIV_OP_COUNT:e=0 | DIV_OP_COUNT + edge level | + | DIV_OP_COUNT:i=0 | DIV_OP_COUNT + invert | + | DIV_OP_COUNT:c=0 | DIV_OP_COUNT + counter-mask in ran... | + | DIV_OP_COUNT:g=0 | DIV_OP_COUNT + measure in guest | + | DIV_OP_COUNT:u=0 | DIV_OP_COUNT + monitor at user level | + | DIV_OP_COUNT:k=0 | DIV_OP_COUNT + monitor at kernel l... | + | DIV_OP_COUNT:period=0 | DIV_OP_COUNT + sampling period | + | DIV_OP_COUNT:freq=0 | DIV_OP_COUNT + sampling frequency ... | + | DIV_OP_COUNT:excl=0 | DIV_OP_COUNT + exclusive access | + | DIV_OP_COUNT:mg=0 | DIV_OP_COUNT + monitor guest execu... | + | DIV_OP_COUNT:mh=0 | DIV_OP_COUNT + monitor host execution | + | DIV_OP_COUNT:cpu=0 | DIV_OP_COUNT + CPU to program | + | DIV_OP_COUNT:pinned=0 | DIV_OP_COUNT + pin event to counters | + | ... etc. ... | | + | amd64_rapl::RAPL_ENERGY_PKG | Number of Joules consumed by all c... | + | amd64_rapl::RAPL_ENERGY_PKG:u=0 | amd64_rapl::RAPL_ENERGY_PKG + moni... | + | amd64_rapl::RAPL_ENERGY_PKG:k=0 | amd64_rapl::RAPL_ENERGY_PKG + moni... | + | amd64_rapl::RAPL_ENERGY_PKG:period=0 | amd64_rapl::RAPL_ENERGY_PKG + samp... | + | amd64_rapl::RAPL_ENERGY_PKG:freq=0 | amd64_rapl::RAPL_ENERGY_PKG + samp... | + | amd64_rapl::RAPL_ENERGY_PKG:excl=0 | amd64_rapl::RAPL_ENERGY_PKG + excl... | + | amd64_rapl::RAPL_ENERGY_PKG:mg=0 | amd64_rapl::RAPL_ENERGY_PKG + moni... | + | amd64_rapl::RAPL_ENERGY_PKG:mh=0 | amd64_rapl::RAPL_ENERGY_PKG + moni... | + | amd64_rapl::RAPL_ENERGY_PKG:cpu=0 | amd64_rapl::RAPL_ENERGY_PKG + CPU ... | + | amd64_rapl::RAPL_ENERGY_PKG:pinned=0 | amd64_rapl::RAPL_ENERGY_PKG + pin ... | + | appio:::READ_BYTES | Bytes read | + | appio:::READ_CALLS | Number of read calls | + | appio:::READ_ERR | Number of read calls that resulted... | + | appio:::READ_INTERRUPTED | Number of read calls that timed ou... | + | appio:::READ_WOULD_BLOCK | Number of read calls that would ha... | + | appio:::READ_SHORT | Number of read calls that returned... | + | appio:::READ_EOF | Number of read calls that returned... | + | appio:::READ_BLOCK_SIZE | Average block size of reads | + | appio:::READ_USEC | Real microseconds spent in reads | + | appio:::WRITE_BYTES | Bytes written | + | appio:::WRITE_CALLS | Number of write calls | + | appio:::WRITE_ERR | Number of write calls that resulte... | + | appio:::WRITE_SHORT | Number of write calls that wrote l... | + | appio:::WRITE_INTERRUPTED | Number of write calls that timed o... | + | appio:::WRITE_WOULD_BLOCK | Number of write calls that would h... | + | appio:::WRITE_BLOCK_SIZE | Mean block size of writes | + | appio:::WRITE_USEC | Real microseconds spent in writes | + | appio:::OPEN_CALLS | Number of open calls | + | appio:::OPEN_ERR | Number of open calls that resulted... | + | appio:::OPEN_FDS | Number of currently open descriptors | + | appio:::SELECT_USEC | Real microseconds spent in select ... | + | appio:::RECV_BYTES | Bytes read in recv/recvmsg/recvfrom | + | appio:::RECV_CALLS | Number of recv/recvmsg/recvfrom calls | + | appio:::RECV_ERR | Number of recv/recvmsg/recvfrom ca... | + | appio:::RECV_INTERRUPTED | Number of recv/recvmsg/recvfrom ca... | + | appio:::RECV_WOULD_BLOCK | Number of recv/recvmsg/recvfrom ca... | + | appio:::RECV_SHORT | Number of recv/recvmsg/recvfrom ca... | + | appio:::RECV_EOF | Number of recv/recvmsg/recvfrom ca... | + | appio:::RECV_BLOCK_SIZE | Average block size of recv/recvmsg... | + | appio:::RECV_USEC | Real microseconds spent in recv/re... | + | appio:::SOCK_READ_BYTES | Bytes read from socket | + | appio:::SOCK_READ_CALLS | Number of read calls on socket | + | appio:::SOCK_READ_ERR | Number of read calls on socket tha... | + | appio:::SOCK_READ_SHORT | Number of read calls on socket tha... | + | appio:::SOCK_READ_WOULD_BLOCK | Number of read calls on socket tha... | + | appio:::SOCK_READ_USEC | Real microseconds spent in read(s)... | + | appio:::SOCK_WRITE_BYTES | Bytes written to socket | + | appio:::SOCK_WRITE_CALLS | Number of write calls to socket | + | appio:::SOCK_WRITE_ERR | Number of write calls to socket th... | + | appio:::SOCK_WRITE_SHORT | Number of write calls to socket th... | + | appio:::SOCK_WRITE_WOULD_BLOCK | Number of write calls to socket th... | + | appio:::SOCK_WRITE_USEC | Real microseconds spent in write(s... | + | appio:::SEEK_CALLS | Number of seek calls | + | appio:::SEEK_ABS_STRIDE_SIZE | Average absolute stride size of seeks | + | appio:::SEEK_USEC | Real microseconds spent in seek calls | + | coretemp:::hwmon2:in0_input | V, amdgpu module, label vddgfx | + | coretemp:::hwmon2:temp1_input | degrees C, amdgpu module, label edge | + | coretemp:::hwmon2:temp2_input | degrees C, amdgpu module, label ju... | + | coretemp:::hwmon2:temp3_input | degrees C, amdgpu module, label mem | + | coretemp:::hwmon2:fan1_input | RPM, amdgpu module, label ? | + | coretemp:::hwmon0:temp1_input | degrees C, nvme module, label Comp... | + | coretemp:::hwmon0:temp2_input | degrees C, nvme module, label Sens... | + | coretemp:::hwmon0:temp3_input | degrees C, nvme module, label Sens... | + | coretemp:::hwmon3:temp1_input | degrees C, k10temp module, label Tctl | + | coretemp:::hwmon3:temp2_input | degrees C, k10temp module, label Tdie | + | coretemp:::hwmon3:temp5_input | degrees C, k10temp module, label T... | + | coretemp:::hwmon3:temp7_input | degrees C, k10temp module, label T... | + | coretemp:::hwmon1:temp1_input | degrees C, enp1s0 module, label PH... | + | coretemp:::hwmon1:temp2_input | degrees C, enp1s0 module, label MA... | + | io:::rchar | Characters read. | + | io:::wchar | Characters written. | + | io:::syscr | Characters read by system calls. | + | io:::syscw | Characters written by system calls. | + | io:::read_bytes | Binary bytes read. | + | io:::write_bytes | Binary bytes written. | + | io:::cancelled_write_bytes | Binary write bytes cancelled. | + | net:::lo:rx:bytes | lo receive bytes | + | net:::lo:rx:packets | lo receive packets | + | net:::lo:rx:errors | lo receive errors | + | net:::lo:rx:dropped | lo receive dropped | + | net:::lo:rx:fifo | lo receive fifo | + | net:::lo:rx:frame | lo receive frame | + | net:::lo:rx:compressed | lo receive compressed | + | net:::lo:rx:multicast | lo receive multicast | + | net:::lo:tx:bytes | lo transmit bytes | + | net:::lo:tx:packets | lo transmit packets | + | net:::lo:tx:errors | lo transmit errors | + | net:::lo:tx:dropped | lo transmit dropped | + | net:::lo:tx:fifo | lo transmit fifo | + | net:::lo:tx:colls | lo transmit colls | + | net:::lo:tx:carrier | lo transmit carrier | + | net:::lo:tx:compressed | lo transmit compressed | + | net:::enp1s0:rx:bytes | enp1s0 receive bytes | + | net:::enp1s0:rx:packets | enp1s0 receive packets | + | net:::enp1s0:rx:errors | enp1s0 receive errors | + | net:::enp1s0:rx:dropped | enp1s0 receive dropped | + | net:::enp1s0:rx:fifo | enp1s0 receive fifo | + | net:::enp1s0:rx:frame | enp1s0 receive frame | + | net:::enp1s0:rx:compressed | enp1s0 receive compressed | + | net:::enp1s0:rx:multicast | enp1s0 receive multicast | + | net:::enp1s0:tx:bytes | enp1s0 transmit bytes | + | net:::enp1s0:tx:packets | enp1s0 transmit packets | + | net:::enp1s0:tx:errors | enp1s0 transmit errors | + | net:::enp1s0:tx:dropped | enp1s0 transmit dropped | + | net:::enp1s0:tx:fifo | enp1s0 transmit fifo | + | net:::enp1s0:tx:colls | enp1s0 transmit colls | + | net:::enp1s0:tx:carrier | enp1s0 transmit carrier | + | net:::enp1s0:tx:compressed | enp1s0 transmit compressed | + | net:::vxlan.calico:rx:bytes | vxlan.calico receive bytes | + | net:::vxlan.calico:rx:packets | vxlan.calico receive packets | + | net:::vxlan.calico:rx:errors | vxlan.calico receive errors | + | net:::vxlan.calico:rx:dropped | vxlan.calico receive dropped | + | net:::vxlan.calico:rx:fifo | vxlan.calico receive fifo | + | net:::vxlan.calico:rx:frame | vxlan.calico receive frame | + | net:::vxlan.calico:rx:compressed | vxlan.calico receive compressed | + | net:::vxlan.calico:rx:multicast | vxlan.calico receive multicast | + | net:::vxlan.calico:tx:bytes | vxlan.calico transmit bytes | + | net:::vxlan.calico:tx:packets | vxlan.calico transmit packets | + | net:::vxlan.calico:tx:errors | vxlan.calico transmit errors | + | net:::vxlan.calico:tx:dropped | vxlan.calico transmit dropped | + | net:::vxlan.calico:tx:fifo | vxlan.calico transmit fifo | + | net:::vxlan.calico:tx:colls | vxlan.calico transmit colls | + | net:::vxlan.calico:tx:carrier | vxlan.calico transmit carrier | + | net:::vxlan.calico:tx:compressed | vxlan.calico transmit compressed | + | net:::cali59d6fabc2aa:rx:bytes | cali59d6fabc2aa receive bytes | + | net:::cali59d6fabc2aa:rx:packets | cali59d6fabc2aa receive packets | + | net:::cali59d6fabc2aa:rx:errors | cali59d6fabc2aa receive errors | + | net:::cali59d6fabc2aa:rx:dropped | cali59d6fabc2aa receive dropped | + | net:::cali59d6fabc2aa:rx:fifo | cali59d6fabc2aa receive fifo | + | net:::cali59d6fabc2aa:rx:frame | cali59d6fabc2aa receive frame | + | net:::cali59d6fabc2aa:rx:compressed | cali59d6fabc2aa receive compressed | + | net:::cali59d6fabc2aa:rx:multicast | cali59d6fabc2aa receive multicast | + | net:::cali59d6fabc2aa:tx:bytes | cali59d6fabc2aa transmit bytes | + | net:::cali59d6fabc2aa:tx:packets | cali59d6fabc2aa transmit packets | + | net:::cali59d6fabc2aa:tx:errors | cali59d6fabc2aa transmit errors | + | net:::cali59d6fabc2aa:tx:dropped | cali59d6fabc2aa transmit dropped | + | net:::cali59d6fabc2aa:tx:fifo | cali59d6fabc2aa transmit fifo | + | net:::cali59d6fabc2aa:tx:colls | cali59d6fabc2aa transmit colls | + | net:::cali59d6fabc2aa:tx:carrier | cali59d6fabc2aa transmit carrier | + | net:::cali59d6fabc2aa:tx:compressed | cali59d6fabc2aa transmit compressed | + |---------------------------------------|---------------------------------------| + | GPU | | + |---------------------------------------|---------------------------------------| + | TCC_EA1_WRREQ[0]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[1]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[2]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[3]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[4]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[5]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[6]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[7]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[8]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[9]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[10]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[11]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[12]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[13]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[14]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ[15]:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ_64B[0]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[1]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[2]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[3]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[4]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[5]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[6]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[7]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[8]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[9]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[10]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[11]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[12]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[13]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[14]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_64B[15]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA1_WRREQ_STALL[0]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[1]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[2]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[3]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[4]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[5]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[6]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[7]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[8]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[9]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[10]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[11]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[12]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[13]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[14]:device=0 | Number of cycles a write request w... | + | TCC_EA1_WRREQ_STALL[15]:device=0 | Number of cycles a write request w... | + | TCC_EA1_RDREQ[0]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[1]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[2]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[3]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[4]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[5]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[6]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[7]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[8]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[9]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[10]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[11]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[12]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[13]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[14]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ[15]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_RDREQ_32B[0]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[1]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[2]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[3]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[4]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[5]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[6]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[7]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[8]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[9]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[10]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[11]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[12]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[13]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[14]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_32B[15]:device=0 | Number of 32-byte TCC/EA read requ... | + | GRBM_COUNT:device=0 | Tie High - Count Number of Clocks | + | GRBM_GUI_ACTIVE:device=0 | The GUI is Active | + | SQ_WAVES:device=0 | Count number of waves sent to SQs.... | + | SQ_INSTS_VALU:device=0 | Number of VALU instructions issued... | + | SQ_INSTS_VMEM_WR:device=0 | Number of VMEM write instructions ... | + | SQ_INSTS_VMEM_RD:device=0 | Number of VMEM read instructions i... | + | SQ_INSTS_SALU:device=0 | Number of SALU instructions issued... | + | SQ_INSTS_SMEM:device=0 | Number of SMEM instructions issued... | + | SQ_INSTS_FLAT:device=0 | Number of FLAT instructions issued... | + | SQ_INSTS_FLAT_LDS_ONLY:device=0 | Number of FLAT instructions issued... | + | SQ_INSTS_LDS:device=0 | Number of LDS instructions issued ... | + | SQ_INSTS_GDS:device=0 | Number of GDS instructions issued.... | + | SQ_WAIT_INST_LDS:device=0 | Number of wave-cycles spent waitin... | + | SQ_ACTIVE_INST_VALU:device=0 | regspec 71? Number of cycles the S... | + | SQ_INST_CYCLES_SALU:device=0 | Number of cycles needed to execute... | + | SQ_THREAD_CYCLES_VALU:device=0 | Number of thread-cycles used to ex... | + | SQ_LDS_BANK_CONFLICT:device=0 | Number of cycles LDS is stalled by... | + | TA_TA_BUSY[0]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[1]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[2]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[3]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[4]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[5]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[6]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[7]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[8]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[9]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[10]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[11]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[12]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[13]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[14]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_TA_BUSY[15]:device=0 | TA block is busy. Perf_Windowing n... | + | TA_FLAT_READ_WAVEFRONTS[0]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[1]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[2]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[3]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[4]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[5]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[6]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[7]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[8]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[9]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[10]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[11]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[12]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[13]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[14]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_READ_WAVEFRONTS[15]:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_WRITE_WAVEFRONTS[0]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[1]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[2]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[3]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[4]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[5]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[6]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[7]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[8]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[9]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[10]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[11]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[12]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[13]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[14]:device=0 | Number of flat opcode writes proce... | + | TA_FLAT_WRITE_WAVEFRONTS[15]:device=0 | Number of flat opcode writes proce... | + | TCC_HIT[0]:device=0 | Number of cache hits. | + | TCC_HIT[1]:device=0 | Number of cache hits. | + | TCC_HIT[2]:device=0 | Number of cache hits. | + | TCC_HIT[3]:device=0 | Number of cache hits. | + | TCC_HIT[4]:device=0 | Number of cache hits. | + | TCC_HIT[5]:device=0 | Number of cache hits. | + | TCC_HIT[6]:device=0 | Number of cache hits. | + | TCC_HIT[7]:device=0 | Number of cache hits. | + | TCC_HIT[8]:device=0 | Number of cache hits. | + | TCC_HIT[9]:device=0 | Number of cache hits. | + | TCC_HIT[10]:device=0 | Number of cache hits. | + | TCC_HIT[11]:device=0 | Number of cache hits. | + | TCC_HIT[12]:device=0 | Number of cache hits. | + | TCC_HIT[13]:device=0 | Number of cache hits. | + | TCC_HIT[14]:device=0 | Number of cache hits. | + | TCC_HIT[15]:device=0 | Number of cache hits. | + | TCC_MISS[0]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[1]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[2]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[3]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[4]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[5]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[6]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[7]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[8]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[9]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[10]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[11]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[12]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[13]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[14]:device=0 | Number of cache misses. UC reads c... | + | TCC_MISS[15]:device=0 | Number of cache misses. UC reads c... | + | TCC_EA_WRREQ[0]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[1]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[2]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[3]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[4]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[5]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[6]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[7]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[8]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[9]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[10]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[11]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[12]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[13]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[14]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ[15]:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ_64B[0]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[1]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[2]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[3]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[4]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[5]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[6]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[7]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[8]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[9]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[10]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[11]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[12]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[13]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[14]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_64B[15]:device=0 | Number of 64-byte transactions goi... | + | TCC_EA_WRREQ_STALL[0]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[1]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[2]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[3]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[4]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[5]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[6]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[7]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[8]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[9]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[10]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[11]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[12]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[13]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[14]:device=0 | Number of cycles a write request w... | + | TCC_EA_WRREQ_STALL[15]:device=0 | Number of cycles a write request w... | + | TCC_EA_RDREQ[0]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[1]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[2]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[3]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[4]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[5]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[6]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[7]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[8]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[9]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[10]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[11]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[12]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[13]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[14]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ[15]:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_RDREQ_32B[0]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[1]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[2]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[3]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[4]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[5]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[6]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[7]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[8]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[9]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[10]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[11]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[12]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[13]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[14]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_32B[15]:device=0 | Number of 32-byte TCC/EA read requ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[0]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[1]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[2]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[3]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[4]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[5]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[6]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[7]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[8]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[9]:de... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[10]:d... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[11]:d... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[12]:d... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[13]:d... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[14]:d... | TCP stalls TA data interface. Now ... | + | TCP_TCP_TA_DATA_STALL_CYCLES[15]:d... | TCP stalls TA data interface. Now ... | + | TCC_EA1_RDREQ_32B_sum:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA1_RDREQ_sum:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA1_WRREQ_sum:device=0 | Number of transactions (either 32-... | + | TCC_EA1_WRREQ_64B_sum:device=0 | Number of 64-byte transactions goi... | + | TCC_WRREQ1_STALL_max:device=0 | Number of cycles a write request w... | + | RDATA1_SIZE:device=0 | The total kilobytes fetched from t... | + | WDATA1_SIZE:device=0 | The total kilobytes written to the... | + | FETCH_SIZE:device=0 | The total kilobytes fetched from t... | + | WRITE_SIZE:device=0 | The total kilobytes written to the... | + | WRITE_REQ_32B:device=0 | The total number of 32-byte effect... | + | TA_BUSY_avr:device=0 | TA block is busy. Average over TA ... | + | TA_BUSY_max:device=0 | TA block is busy. Max over TA inst... | + | TA_BUSY_min:device=0 | TA block is busy. Min over TA inst... | + | TA_FLAT_READ_WAVEFRONTS_sum:device=0 | Number of flat opcode reads proces... | + | TA_FLAT_WRITE_WAVEFRONTS_sum:device=0 | Number of flat opcode writes proce... | + | TCC_HIT_sum:device=0 | Number of cache hits. Sum over TCC... | + | TCC_MISS_sum:device=0 | Number of cache misses. Sum over T... | + | TCC_EA_RDREQ_32B_sum:device=0 | Number of 32-byte TCC/EA read requ... | + | TCC_EA_RDREQ_sum:device=0 | Number of TCC/EA read requests (ei... | + | TCC_EA_WRREQ_sum:device=0 | Number of transactions (either 32-... | + | TCC_EA_WRREQ_64B_sum:device=0 | Number of 64-byte transactions goi... | + | TCC_WRREQ_STALL_max:device=0 | Number of cycles a write request w... | + | GPUBusy:device=0 | The percentage of time GPU was busy. | + | Wavefronts:device=0 | Total wavefronts. | + | VALUInsts:device=0 | The average number of vector ALU i... | + | SALUInsts:device=0 | The average number of scalar ALU i... | + | VFetchInsts:device=0 | The average number of vector fetch... | + | SFetchInsts:device=0 | The average number of scalar fetch... | + | VWriteInsts:device=0 | The average number of vector write... | + | FlatVMemInsts:device=0 | The average number of FLAT instruc... | + | LDSInsts:device=0 | The average number of LDS read or ... | + | FlatLDSInsts:device=0 | The average number of FLAT instruc... | + | GDSInsts:device=0 | The average number of GDS read or ... | + | VALUUtilization:device=0 | The percentage of active vector AL... | + | VALUBusy:device=0 | The percentage of GPUTime vector A... | + | SALUBusy:device=0 | The percentage of GPUTime scalar A... | + | FetchSize:device=0 | The total kilobytes fetched from t... | + | WriteSize:device=0 | The total kilobytes written to the... | + | MemWrites32B:device=0 | The total number of effective 32B ... | + | L2CacheHit:device=0 | The percentage of fetch, write, at... | + | MemUnitBusy:device=0 | The percentage of GPUTime the memo... | + | MemUnitStalled:device=0 | The percentage of GPUTime the memo... | + | WriteUnitStalled:device=0 | The percentage of GPUTime the Writ... | + | ALUStalledByLDS:device=0 | The percentage of GPUTime ALU unit... | + | LDSBankConflict:device=0 | The percentage of GPUTime LDS is s... | + |---------------------------------------|---------------------------------------| + +Creating a configuration file +======================================== + +Omnitrace supports three configuration file formats: JSON, XML, and plain text. +Use ``omnitrace-avail -G -F txt json xml`` to generate default +configuration files in each format. Optionally +include the ``--all`` flag to include full descriptions and other information. +Configuration files are specified by the ``OMNITRACE_CONFIG_FILE`` environment variable +which by default looks for ``${HOME}/.omnitrace.cfg`` and ``${HOME}/.omnitrace.json``. +Multiple configuration files can be concatenated using the ``:`` symbol, for example: + +.. code-block:: shell + + export OMNITRACE_CONFIG_FILE=~/.config/omnitrace.cfg:~/.config/omnitrace.json + +If a configuration variable is specified in both a configuration file and in the environment, +the environment variable takes precedence. + +Sample text configuration file +----------------------------------- + +Text files support very basic variables and are case insensitive. +Variables are created when an lvalue starts with a ``$`` and are +de-referenced when they appear as rvalues. + +Entries in the text configuration file which do not match a known setting +in ``omnitrace-avail`` but are prefixed with ``OMNITRACE_`` are interpreted as +environment variables. They are exported via ``setenv`` +but do not override an existing value for the environment variable. + +.. code-block:: shell + + # lvals starting with $ are variables + $ENABLE = ON + $SAMPLE = OFF + + # use fields + OMNITRACE_TRACE = $ENABLE + OMNITRACE_PROFILE = $ENABLE + OMNITRACE_USE_SAMPLING = $SAMPLE + OMNITRACE_USE_PROCESS_SAMPLING = $SAMPLE + + # debug + OMNITRACE_DEBUG = OFF + OMNITRACE_VERBOSE = 1 + + # output fields + OMNITRACE_OUTPUT_PATH = omnitrace-output + OMNITRACE_OUTPUT_PREFIX = %tag%/ + OMNITRACE_TIME_OUTPUT = OFF + OMNITRACE_USE_PID = OFF + + # timemory fields + OMNITRACE_PAPI_EVENTS = PAPI_TOT_INS PAPI_FP_INS + OMNITRACE_TIMEMORY_COMPONENTS = wall_clock peak_rss trip_count + OMNITRACE_MEMORY_UNITS = MB + OMNITRACE_TIMING_UNITS = sec + + # sampling fields + OMNITRACE_SAMPLING_FREQ = 50 + OMNITRACE_SAMPLING_DELAY = 0.1 + OMNITRACE_SAMPLING_CPUS = 0-3 + OMNITRACE_SAMPLING_GPUS = $env:HIP_VISIBLE_DEVICES + + # misc env variables (see metadata JSON file after run) + $env:OMNITRACE_SAMPLING_KEEP_DYNINST_SUFFIX = OFF + +Sample JSON configuration file +----------------------------------- + +The full JSON specification for a configuration value contains a lot of information: + +.. code-block:: json + + { + "omnitrace": { + "settings": { + "OMNITRACE_ADD_SECONDARY": { + "count": -1, + "name": "add_secondary", + "data_type": "bool", + "initial": true, + "value": true, + "max_count": 1, + "cmdline": [ + "--omnitrace-add-secondary" + ], + "environ": "OMNITRACE_ADD_SECONDARY", + "cereal_class_version": 1, + "categories": [ + "component", + "data", + "native" + ], + "description": "Enable/disable components adding secondary (child) entries when available. E.g. suppress individual CUDA kernels, etc. when using Cupti components" + } + } + } + } + +However when writing an JSON configuration file, the following example is minimally acceptable +for ``OMNITRACE_ADD_SECONDARY``: + +.. code-block:: json + + { + "omnitrace": { + "settings": { + "OMNITRACE_ADD_SECONDARY": { + "value": true + } + } + } + } + +Sample XML configuration file +----------------------------------- + +The full XML specification for a configuration value contains the same information as the JSON specification: + +.. code-block:: xml + + + + + + 2 + + + 1 + add_secondary + OMNITRACE_ADD_SECONDARY + ... + -1 + 1 + + --omnitrace-add-secondary + + + component + data + native + + bool + true + true + + + + + + +However, when writing an XML configuration file, it is minimally acceptable +to set ``OMNITRACE_ADD_SECONDARY=false``: + +.. code-block:: xml + + + + + + + false + + + + diff --git a/docs/how-to/configuring-validating-environment.rst b/docs/how-to/configuring-validating-environment.rst new file mode 100644 index 000000000..800976345 --- /dev/null +++ b/docs/how-to/configuring-validating-environment.rst @@ -0,0 +1,71 @@ +.. meta:: + :description: Omnitrace documentation and reference + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + +**************************************************** +Configuring and validating the environment +**************************************************** + +After installing `Omnitrace `_, additional steps are required to set up +and validate the environment. + +.. note:: + + The following instructions use the installation path ``/opt/omnitrace``. If + Omnitrace is installed elsewhere, substitute the actual installation path. + +Configuring the environment +======================================== + +After Omnitrace is installed, source the ``setup-env.sh`` script to prefix the +``PATH``, ``LD_LIBRARY_PATH``, and other environment variables: + +.. code-block:: shell + + source /opt/omnitrace/share/omnitrace/setup-env.sh + +Alternatively, if environment modules are supported, add the ``/share/modulefiles`` directory +to ``MODULEPATH``: + +.. code-block:: shell + + module use /opt/omnitrace/share/modulefiles + +.. note:: + + As an alternative, the above line can be added to the ``${HOME}/.modulerc`` file. + +After Omnitrace has been added to the ``MODULEPATH``, it can be loaded +using ``module load omnitrace/`` and unloaded using ``module unload omnitrace/``. + +.. code-block:: shell + + module load omnitrace/1.0.0 + module unload omnitrace/1.0.0 + +.. note:: + + You might also need to add the path to the ROCm libraries to ``LD_LIBRARY_PATH``, + for example, ``export LD_LIBRARY_PATH=/opt/rocm/lib:${LD_LIBRARY_PATH}`` + +Validating the environment configuration +======================================== + +If the following commands all run successfully with the expected output, +then you are ready to use Omnitrace: + +.. code-block:: shell + + which omnitrace + which omnitrace-avail + which omnitrace-sample + omnitrace-instrument --help + omnitrace-avail --all + omnitrace-sample --help + +If Omnitrace was built with Python support, validate these additional commands: + +.. code-block:: shell + + which omnitrace-python + omnitrace-python --help diff --git a/docs/how-to/general-tips-using-omnitrace.rst b/docs/how-to/general-tips-using-omnitrace.rst new file mode 100644 index 000000000..da4c5be03 --- /dev/null +++ b/docs/how-to/general-tips-using-omnitrace.rst @@ -0,0 +1,60 @@ +.. meta:: + :description: Omnitrace documentation and reference + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + +********************************** +General tips for using Omnitrace +********************************** + +Follow these general guidelines when using Omnitrace. For an explanation of the terms used in this topic, see +the :doc:`Omnitrace glossary <../reference/omnitrace-glossary>`. + +* Use ``omnitrace-avail`` to look up configuration settings, hardware counters, and data collection components + + * Use the ``-d`` flag for descriptions + +* Generate a default configuration with ``omnitrace-avail -G ${HOME}/.omnitrace.cfg`` and adjust it + to the desired default behavior +* **Decide whether binary instrumentation, statistical sampling, or both** provides the desired performance data (for non-Python applications) +* Compile code with optimization enabled (``-O2`` or higher), disable asserts (i.e. ``-DNDEBUG``), and include debug info (for instance, ``-g1`` at a minimum) + + * Compiling with debug info does not slow down the code, it only increases compile time and the size of the binary + * In CMake, this is generally done with the settings ``CMAKE_BUILD_TYPE=RelWithDebInfo`` or ``CMAKE_BUILD_TYPE=Release`` and ``CMAKE__FLAGS=-g1`` + +* **Use binary instrumentation for characterizing the performance of every invocation of specific functions** +* **Use statistical sampling to characterize the performance of the entire application while minimizing overhead** +* Enable statistical sampling after binary instrumentation to help "fill in the gaps" between instrumented regions +* Use the user API to create custom regions and enable/disable Omnitrace for specific processes, threads, and regions +* Dynamic symbol interception, callback APIs, and the user API are always available with binary instrumentation and sampling + + * Dynamic symbol interception and callback APIs are (generally) controlled through ``OMNITRACE_USE_`` + options, for example, ``OMNITRACE_USE_KOKKOSP`` and ``OMNITRACE_USE_OMPT`` enable Kokkos-Tools and OpenMP-Tools + callbacks, respectively + +* When generically seeking regions for performance improvement: + + * **Start off by collecting a flat profile** + * Look for functions with high call counts, large cumulative runtimes/values, or large standard deviations + + * When call counts are high, improving the performance of this function or "inlining" the function can result in quick and easy performance improvements + * When the standard deviation is high, collect a hierarchical profile and see if the high variation can be attributable to the calling context. + In this scenario, consider creating a specialized version of the function for the longer-running contexts + + * **Collect a hierarchical profile** and verify the functions that are part of the "critical path" of your + application, as indicated in the flat profile + + * For example, functions with high call counts but which are part of a "setup" or "post-processing" + phase that does not consume much time relative to the overall time are generally a lower priority for optimization + +* **Use the information from the profiles when analyzing detailed traces** +* When using binary instrumentation in "trace" mode, **binary rewrites are preferable to runtime instrumentation**. + + * Binary rewrites only instrument the functions defined in the target binary, whereas runtime instrumentation might instrument functions defined in the shared libraries which are linked into the target binary + +* When using binary instrumentation with MPI, avoid runtime instrumentation + + * Runtime instrumentation requires a fork and a ``ptrace``, which is generally incompatible with how MPI applications spawn processes + * Perform a binary rewrite of the executable (and optionally, libraries used by the executable) using MPI and run + the generated instrumented executable using ``omnitrace-run`` instead of the original. + For example, instead of ``mpirun -n 2 ./myexe``, use ``mpirun -n 2 omnitrace-run -- ./myexe.inst``, where + ``myexe.inst`` is the instrumented ``myexe`` executable that was generated. diff --git a/docs/how-to/instrumenting-rewriting-binary-application.rst b/docs/how-to/instrumenting-rewriting-binary-application.rst new file mode 100644 index 000000000..c3c3083c1 --- /dev/null +++ b/docs/how-to/instrumenting-rewriting-binary-application.rst @@ -0,0 +1,942 @@ +.. meta:: + :description: Omnitrace documentation and reference + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + +**************************************************** +Instrumenting and rewriting a binary application +**************************************************** + +There are three ways to perform instrumentation with the ``omnitrace-instrument`` executable: + +* Runtime instrumentation +* Attaching to an already running process +* Binary rewrite + +Here is a comparison of the three modes: + +* Runtime instrumentation of the application using the ``omnitrace-instrument`` executable + (analogous to ``gdb --args ``) + + * This mode is the default if neither the ``-p`` nor ``-o`` command-line options are used + * Runtime instrumentation supports instrumenting not only the target executable but also + the shared libraries loaded by the target executable. Consequently, this mode consumes more memory, + takes longer to perform the instrumentation, and tends to add more significant overhead to the + runtime of the application. + * This mode is recommended if you want to analyze not only the performance of your executable and/or + libraries but also the performance of the library dependencies + +* Attaching to a process that is currently running (analogous to ``gdb -p ``) + + * This mode is activated using ``-p `` + * The same caveats from the first example apply with respect to memory and overhead + + .. note:: + + Attaching to a running process is an alpha feature and detaching from the target process + without ending the target process is not currently supported. + +* Binary rewrite to generate a new executable or library with the instrumentation built-in + + * This mode is activated through the ``-o `` option + * Binary rewriting is limited to the text section of the target executable or library. It does not instrument + the dynamically-linked libraries. Consequently, this mode performs the + instrumentation significantly faster + and has a much lower overhead when running the instrumented executable and libraries. + * Binary rewriting is the recommended mode when the target executable uses + process-level parallelism (for example, MPI) + * If the target executable has a minimal ``main`` routine and the bulk of your + application is in one specific dynamic library, + see :ref:`binary-rewriting-library-label` for help + +The omnitrace-instrument executable +======================================== + +Instrumentation is performed with the ``omnitrace-instrument`` executable. For more details, use the ``-h`` or ``--help`` option to +view the help menu. + +.. code-block:: shell + + $ omnitrace-instrument --help + [omnitrace-instrument] Usage: omnitrace-instrument [ --help (count: 0, dtype: bool) + --version (count: 0, dtype: bool) + --verbose (max: 1, dtype: bool) + --error (max: 1, dtype: boolean) + --debug (max: 1, dtype: bool) + --log (count: 1) + --log-file (count: 1) + --simulate (max: 1, dtype: boolean) + --print-format (min: 1, dtype: string) + --print-dir (count: 1, dtype: string) + --print-available (count: 1) + --print-instrumented (count: 1) + --print-coverage (count: 1) + --print-excluded (count: 1) + --print-overlapping (count: 1) + --print-instructions (max: 1, dtype: bool) + --output (min: 0, dtype: string) + --pid (count: 1, dtype: int) + --mode (count: 1) + --force (max: 1, dtype: bool) + --command (count: 1) + --prefer (count: 1) + --library (count: unlimited) + --main-function (count: 1) + --load (count: unlimited, dtype: string) + --load-instr (count: unlimited, dtype: filepath) + --init-functions (count: unlimited, dtype: string) + --fini-functions (count: unlimited, dtype: string) + --all-functions (max: 1, dtype: boolean) + --function-include (count: unlimited) + --function-exclude (count: unlimited) + --function-restrict (count: unlimited) + --caller-include (count: unlimited) + --module-include (count: unlimited) + --module-exclude (count: unlimited) + --module-restrict (count: unlimited) + --internal-function-include (count: unlimited) + --internal-module-include (count: unlimited) + --instruction-exclude (count: unlimited) + --internal-library-deps (min: 0, dtype: boolean) + --internal-library-append (count: unlimited) + --internal-library-remove (count: unlimited) + --linkage (min: 1) + --visibility (min: 1) + --label (count: unlimited, dtype: string) + --config (min: 1, dtype: string) + --default-components (count: unlimited, dtype: string) + --env (count: unlimited) + --mpi (max: 1, dtype: bool) + --instrument-loops (max: 1, dtype: boolean) + --min-instructions (count: 1, dtype: int) + --min-address-range (count: 1, dtype: int) + --min-instructions-loop (count: 1, dtype: int) + --min-address-range-loop (count: 1, dtype: int) + --coverage (max: 1, dtype: bool) + --dynamic-callsites (max: 1, dtype: boolean) + --traps (max: 1, dtype: boolean) + --loop-traps (max: 1, dtype: boolean) + --allow-overlapping (max: 1, dtype: bool) + --parse-all-modules (max: 1, dtype: bool) + --batch-size (count: 1, dtype: int) + --dyninst-rt (min: 1, dtype: filepath) + --dyninst-options (count: unlimited) + ] -- + + Options: + -h, -?, --help Shows this page + --version Prints the version and exit + + [DEBUG OPTIONS] + + -v, --verbose Verbose output + -e, --error All warnings produce runtime errors + --debug Debug output + --log Number of log entries to display after an error. Any value < 0 will emit the entire log + --log-file Write the log out the specified file during the run + --simulate Exit after outputting diagnostic {available,instrumented,excluded,overlapping} module + function lists, e.g. available.txt + --print-format [ json | txt | xml ] + Output format for diagnostic {available,instrumented,excluded,overlapping} module + function lists, e.g. {print-dir}/available.txt + --print-dir Output directory for diagnostic {available,instrumented,excluded,overlapping} module + function lists, e.g. {print-dir}/available.txt + --print-available [ functions | functions+ | modules | pair | pair+ ] + Print the available entities for instrumentation (functions, modules, or module-function + pair) to stdout after applying regular expressions + --print-instrumented [ functions | functions+ | modules | pair | pair+ ] + Print the instrumented entities (functions, modules, or module-function pair) to stdout + after applying regular expressions + --print-coverage [ functions | functions+ | modules | pair | pair+ ] + Print the instrumented coverage entities (functions, modules, or module-function pair) to + stdout after applying regular expressions + --print-excluded [ functions | functions+ | modules | pair | pair+ ] + Print the entities for instrumentation (functions, modules, or module-function pair) + which are excluded from the instrumentation to stdout after applying regular expressions + --print-overlapping [ functions | functions+ | modules | pair | pair+ ] + Print the entities for instrumentation (functions, modules, or module-function pair) + which overlap other function calls or have multiple entry points to stdout after applying + regular expressions + --print-instructions Print the instructions for each basic-block in the JSON/XML outputs + + [MODE OPTIONS] + + -o, --output Enable generation of a new executable (binary-rewrite). If a filename is not provided, + omnitrace will use the basename and output to the cwd, unless the target binary is in the + cwd. In the latter case, omnitrace will either use ${PWD}/.inst (non-libraries) + or ${PWD}/instrumented/ (libraries) + -p, --pid Connect to running process + -M, --mode [ coverage | sampling | trace ] + Instrumentation mode. \'trace\' mode instruments the selected functions, \'sampling\' mode + only instruments the main function to start and stop the sampler. + -f, --force Force the command-line argument configuration, i.e. don't get cute. Useful for forcing + runtime instrumentation of an executable that [A] Dyninst thinks is a library after + reading ELF and [B] whose name makes it look like a library (e.g. starts with 'lib' + and/or ends in \'.so\', \'.so.*\', or \'.a\') + -c, --command Input executable and arguments (if \'-- \' not provided) + + [LIBRARY OPTIONS] + + --prefer [ shared | static ] Prefer this library types when available + -L, --library Libraries with instrumentation routines (default: "libomnitrace-dl") + -m, --main-function The primary function to instrument around, e.g. \'main\' + --load Supplemental instrumentation library names w/o extension (e.g. \'libinstr\' for + \'libinstr.so\' or \'libinstr.a\') + --load-instr Load {available,instrumented,excluded,overlapping}-instr JSON or XML file(s) and override + what is read from the binary + --init-functions Initialization function(s) for supplemental instrumentation libraries (see \'--load\' + option) + --fini-functions Finalization function(s) for supplemental instrumentation libraries (see \'--load\' option) + --all-functions When finding functions, include the functions which are not instrumentable. This is + purely diagnostic for the available/excluded functions output + + [SYMBOL SELECTION OPTIONS] + + -I, --function-include Regex(es) for including functions (despite heuristics) + -E, --function-exclude Regex(es) for excluding functions (always applied) + -R, --function-restrict Regex(es) for restricting functions only to those that match the provided + regular-expressions + --caller-include Regex(es) for including functions that call the listed functions (despite heuristics) + -MI, --module-include Regex(es) for selecting modules/files/libraries (despite heuristics) + -ME, --module-exclude Regex(es) for excluding modules/files/libraries (always applied) + -MR, --module-restrict Regex(es) for restricting modules/files/libraries only to those that match the provided + regular-expressions + --internal-function-include Regex(es) for including functions which are (likely) utilized by omnitrace itself. Use + this option with care. + --internal-module-include Regex(es) for including modules/libraries which are (likely) utilized by omnitrace + itself. Use this option with care. + --instruction-exclude Regex(es) for excluding functions containing certain instructions + --internal-library-deps Treat the libraries linked to the internal libraries as internal libraries. This increase + the internal library processing time and consume more memory (so use with care) but may + be useful when the application uses Boost libraries and Dyninst is dynamically linked + against the same boost libraries + --internal-library-append Append to the list of libraries which omnitrace treats as being used internally, e.g. + OmniTrace will find all the symbols in this library and prevent them from being + instrumented. + --internal-library-remove [ ld-linux-x86-64.so.2 + libBrokenLocale.so.1 + libanl.so.1 + libbfd.so + libbz2.so + libc.so.6 + libcaliper.so + libcommon.so + libcrypt.so.1 + libdl.so.2 + libdw.so + libdwarf.so + libdyninstAPI_RT.so + libelf.so + libgcc_s.so.1 + libgotcha.so + liblikwid.so + liblzma.so + libnsl.so.1 + libnss_compat.so.2 + libnss_db.so.2 + libnss_dns.so.2 + libnss_files.so.2 + libnss_hesiod.so.2 + libnss_ldap.so.2 + libnss_nis.so.2 + libnss_nisplus.so.2 + libnss_test1.so.2 + libnss_test2.so.2 + libpapi.so + libpfm.so + libprofiler.so + libpthread.so.0 + libresolv.so.2 + librocm_smi64.so + librocmtools.so + librocprofiler64.so + libroctracer64.so + libroctx64.so + librt.so.1 + libstdc++.so.6 + libtbb.so + libtbbmalloc.so + libtbbmalloc_proxy.so + libtcmalloc.so + libtcmalloc_and_profiler.so + libtcmalloc_debug.so + libtcmalloc_minimal.so + libtcmalloc_minimal_debug.so + libthread_db.so.1 + libunwind-coredump.so + libunwind-generic.so + libunwind-ptrace.so + libunwind-setjmp.so + libunwind-x86_64.so + libunwind.so + libutil.so.1 + libz.so + libzstd.so ] + Remove the specified libraries from being treated as being used internally, e.g. + OmniTrace will permit all the symbols in these libraries to be eligible for + instrumentation. + --linkage [ global | local | unique | unknown | weak ] + Only instrument functions with specified linkage (default: global, local, unique) + --visibility [ default | hidden | internal | protected | unknown ] + Only instrument functions with specified visibility (default: default, internal, hidden, + protected) + + [RUNTIME OPTIONS] + + --label [ args | file | line | return ] + Labeling info for functions. By default, just the function name is recorded. Use these + options to gain more information about the function signature or location of the + functions + -C, --config Read in a configuration file and encode these values as the defaults in the executable + -d, --default-components Default components to instrument (only useful when timemory is enabled in omnitrace + library) + --env Environment variables to add to the runtime in form VARIABLE=VALUE. E.g. use \'--env + OMNITRACE_PROFILE=ON\' to default to using timemory instead of perfetto + --mpi Enable MPI support (requires omnitrace built w/ full or partial MPI support). NOTE: this + will automatically be activated if MPI_Init, MPI_Init_thread, MPI_Finalize, + MPI_Comm_rank, or MPI_Comm_size are found in the symbol table of target + + [GRANULARITY OPTIONS] + + -l, --instrument-loops Instrument at the loop level + -i, --min-instructions If the number of instructions in a function is less than this value, exclude it from + instrumentation + -r, --min-address-range If the address range of a function is less than this value, exclude it from + instrumentation + --min-instructions-loop If the number of instructions in a function containing a loop is less than this value, + exclude it from instrumentation + --min-address-range-loop If the address range of a function containing a loop is less than this value, exclude it + from instrumentation + --coverage [ basic_block | function | none ] + Enable recording the code coverage. If instrumenting in coverage mode (\'-M converage\'), + this simply specifies the granularity. If instrumenting in trace or sampling mode, this + enables recording code-coverage in addition to the instrumentation of that mode (if any). + --dynamic-callsites Force instrumentation if a function has dynamic callsites (e.g. function pointers) + --traps Instrument points which require using a trap. On the x86 architecture, because + instructions are of variable size, the instruction at a point may be too small for + Dyninst to replace it with the normal code sequence used to call instrumentation. Also, + when instrumentation is placed at points other than subroutine entry, exit, or call + points, traps may be used to ensure the instrumentation fits. In this case, Dyninst + replaces the instruction with a single-byte instruction that generates a trap. + --loop-traps Instrument points within a loop which require using a trap (only relevant when + --instrument-loops is enabled). + --allow-overlapping Allow dyninst to instrument either multiple functions which overlap (share part of same + function body) or single functions with multiple entry points. For more info, see Section + 2 of the DyninstAPI documentation. + --parse-all-modules By default, omnitrace simply requests Dyninst to provide all the procedures in the + application image. If this option is enabled, omnitrace will iterate over all the modules + and extract the functions. Theoretically, it should be the same but the data is slightly + different, possibly due to weak binding scopes. In general, enabling option will probably + have no visible effect + + [DYNINST OPTIONS] + + -b, --batch-size Dyninst supports batch insertion of multiple points during runtime instrumentation. If + one large batch insertion fails, this value will be used to create smaller batches. + Larger batches generally decrease the instrumentation time + --dyninst-rt Path(s) to the dyninstAPI_RT library + --dyninst-options [ BaseTrampDeletion + DebugParsing + DelayedParsing + InstrStackFrames + MergeTramp + SaveFPR + TrampRecursive + TypeChecking ] + Advanced dyninst options: BPatch::set