Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocm jaxlib v0.4.35 qa matmulpass #128

Open
wants to merge 38 commits into
base: rocm-jaxlib-v0.4.35
Choose a base branch
from

Conversation

zoranjovanovic-ns
Copy link

No description provided.

ScXfjiang and others added 30 commits November 28, 2024 14:36
Passing amdgpu targets to crosstool wrapper which calls hipcc can
restrict the kernels generated to specific set of supported amdgpu
architectures.
Launch dimension should be of the form
((block.x, 1, 1), (thread.x, thready, 1)) to accommodate checks in
(parallel_loop_emitter.cc)[https://github.com/openxla/xla/blob/main/xla/service/gpu/parallel_loop_emitter.cc#L169-L171]
ir_emitter/elemental_ir_emitter clean-up will follow.

PiperOrigin-RevId: 691766033
…xla/service/gpu/te…

Imported from GitHub PR openxla#19484

…sts:gpu_input_fusible_slice_test

Copybara import of the project:

--
0d30738 by Dragan Mladjenovic <[email protected]>:

[ROCm] Fix //xla/tests:complex_unary_op_test and //xla/service/gpu/tests:gpu_input_fusible_slice_test

Merging this change closes openxla#19484

COPYBARA_INTEGRATE_REVIEW=openxla#19484 from ROCm:mlir_tests_new 0d30738
PiperOrigin-RevId: 698374588
Imported from GitHub PR openxla#19426

After this change to the test inputs openxla@b10653f "too many blocks" exception is not getting triggered anymore (shape is not big enough).

Due to the low importance of the test, it was decided to disable it.
Copybara import of the project:

--
ee36ca0 by Milica Makevic <[email protected]>:

Disable gpu_too_many_blocks_test for rocm

Merging this change closes openxla#19426

COPYBARA_INTEGRATE_REVIEW=openxla#19426 from ROCm:disable_too_many_blocks_test ee36ca0
PiperOrigin-RevId: 697974812
…thm.

Now when we have all the pieces of the puzzle for X3 algorithm we could easily add its equivalent for X6.

PiperOrigin-RevId: 688294267
Imported from GitHub PR openxla#19342

Triton is currently disabled on ROCm. Skipping the following subtests in `dot_algorithms_test`:
- TritonAlgorithmTest.Algorithm_BF16_BF16_F32_X3
- TritonAlgorithmTest.Algorithm_BF16_BF16_F32_X6
- TritonAlgorithmTest.Algorithm_TF32_TF32_F32
- TritonAlgorithmTest.Algorithm_TF32_TF32_F32_X3
- TritonAlgorithmTest.Algorithm_BF16_BF16_F32
Copybara import of the project:

--
32bd775 by Milica Makevic <[email protected]>:

Disable unsupported Triton subtests

Merging this change closes openxla#19342

COPYBARA_INTEGRATE_REVIEW=openxla#19342 from ROCm:disable_triton_tests 32bd775
PiperOrigin-RevId: 696740956
…nels

Add NCCL_MAX_NCHANNELS env variable to multi gpu tests
Avoid lazy init of Blas handles, fix for non-canonical dots
Fixed issue with capturing local variable from lambda.
zoranjovanovic-ns and others added 8 commits February 25, 2025 10:08
Add gfx1101 support to XLA
Respect hipruntime constraint on max worksize not to exceed int max
This change fixes the flaky gpu compiler test used to run on rocm CI pipeline gate.
Triton pipeline was wrongly using the TritonGPUAccelerateMatmul pass which supports cuda only.
In rocm there is a different pass which is now used in the rocm pipeline.

by Alexandros Theodoridis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants