[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793
[amdgpu] LLVM 20 updates for AMD MI3xx GPUs#8793tmm77 wants to merge 53 commits intotaichi-dev:masterfrom
Conversation
Parameterize microbenchmarks and vulkan sdk update
fix: Patch to avoid the need to fetch source to build Taichi wheel
Taichi Dockerfile
Co-authored-by: Bhavesh Lad <Bhavesh.Lad@amd.com> Co-authored-by: Tiffany Mintz <tiffany.mintz@amd.com>
Merge latest upstream
Merge master updates
Merge latest Updates
…TX handling, and implement new pass manager setup
Mintz/llvm20 update
Syncing latest release branch with amd-integration branch
| // but to insert passes in the middle, we construct it manually. A simpler way is to | ||
| // use `parsePassPipeline`. For now, we build the default pipeline first. | ||
| if (config.opt_level > 0) { | ||
| MPM = PB.buildPerModuleDefaultPipeline(opt_level); |
There was a problem hiding this comment.
DX12 intrinsic lowering pass lost on reassignment
High Severity
When config.opt_level > 0, MPM is reassigned via MPM = PB.buildPerModuleDefaultPipeline(opt_level), which completely discards the previously added createTaichiIntrinsicLowerPass. The original code added this pass first, then populated optimization passes on the same manager. Now the intrinsic lowering pass never runs for DX12 when optimizations are enabled.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.
|
|
||
| machine_gen_gcn->registerPassBuilderCallbacks(module_gen_gcn_pass_manager); | ||
|
|
||
| builder.run(*module_clone, MAM); |
There was a problem hiding this comment.
AMDGPU GCN output empty for LLVM 17+
Medium Severity
In the print_kernel_amdgcn path for LLVM_VERSION_MAJOR >= 17, the code sets up a new pass manager and runs optimization passes on the cloned module, but never calls addPassesToEmitFile to write assembly to llvm_stream_gcn. The gcnstr buffer remains empty, so the written GCN file will contain no content. The legacy path correctly emits assembly via addPassesToEmitFile.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.
| if ((u.system, u.machine) not in (("Linux", "arm64"), ("Linux", "aarch64"))) and not (cmake_args.get_effective("TI_WITH_AMDGPU")): | ||
| os.environ["LLVM_DIR"] = "/usr/lib/llvm-20/cmake" | ||
| os.environ["CUDA_HOME"] = "/usr/local/cuda" | ||
| os.environ["CPATH"] = "/usr/local/cuda/include" |
There was a problem hiding this comment.
LLVM_DIR hardcoded to Linux path for all platforms
Medium Severity
The final LLVM_DIR assignment unconditionally sets it to /usr/lib/llvm-20/cmake for all non-ARM-Linux, non-AMDGPU platforms, including macOS and Windows. The original code used str(out) which pointed to the platform-specific downloaded LLVM path. This overwrites the correct out-based paths for Darwin and Windows, breaking LLVM discovery on those platforms. Similarly, CUDA_HOME and CPATH are set to Linux-specific paths.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.
| f.read()) | ||
| if not match: | ||
| raise ValueError("VERSION not found!") | ||
| version_number = match[1] |
There was a problem hiding this comment.
Docs conf.py searches for nonexistent CMake function
Medium Severity
The docs/conf.py searches for rocm_setup_version(VERSION ...) in CMakeLists.txt, but the project's CMakeLists.txt does not contain this function call. This causes a ValueError("VERSION not found!") to be raised every time the documentation is built, completely breaking the docs build pipeline.
Reviewed by Cursor Bugbot for commit f47d1b8. Configure here.
This is to address AMD security concerns
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 5 total unresolved issues (including 4 from previous reviews).
Reviewed by Cursor Bugbot for commit 440fcc2. Configure here.
| llvm::ModulePassManager builder = | ||
| module_pass_manager.buildPerModuleDefaultPipeline(llvm::OptimizationLevel::O3); | ||
|
|
||
| machine->registerPassBuilderCallbacks(module_pass_manager); |
There was a problem hiding this comment.
AMDGPU target callbacks registered after pipeline is built
High Severity
machine->registerPassBuilderCallbacks() is called after buildPerModuleDefaultPipeline(), so AMDGPU target-specific passes are never included in the optimization pipeline. Both the CPU (codegen_cpu.cpp:311) and CUDA (jit_cuda.cpp:201) implementations correctly call registerPassBuilderCallbacks before building the pipeline. This same ordering mistake occurs twice — in the GCN printing path and the main optimization path.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 440fcc2. Configure here.


Issue: #
Brief Summary
These code changes update LLVM to version 20 for AMD GPU code generation to enable Taichi on MI300X, MI325X, and MI355X.
Note
High Risk
High risk because it changes LLVM integration across AMDGPU/CUDA/CPU/DX12 backends (pass pipelines, pointer types, intrinsics), which can affect code generation correctness and runtime stability across platforms.
Overview
Updates build and CI tooling to prefer Clang/LLVM 20 (including Linux compiler discovery) and adjusts the build scripts to use system-provided LLVM/CUDA paths rather than always downloading prebuilts.
Modernizes multiple backends for LLVM 16–20 compatibility: switches CPU/CUDA/AMDGPU/DX12 codegen and JIT paths to the New Pass Manager, adapts to removed/renamed LLVM headers/APIs, replaces CUDA
nvvm_ldgintrinsics with an address-space load +!invariant.loadmetadata, and updates various pointer casts toward opaque pointers.Adds new math ops
erf/erfcend-to-end (IR builder, expression ops, LLVM/CUDA codegen, Python API exports), introduces a ROCm multi-stageDockerfile.rocmplus ReadTheDocs/Sphinx docs for ROCm-Simulation packaging, and tweaks microbenchmarks to supportamdgpuand CLI-selected plans.Reviewed by Cursor Bugbot for commit 440fcc2. Bugbot is set up for automated code reviews on this repo. Configure here.