forked from openxla/xla
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rocm jaxlib v0.4.35 qa matmulpass #128
Open
zoranjovanovic-ns
wants to merge
38
commits into
rocm-jaxlib-v0.4.35
Choose a base branch
from
rocm-jaxlib-v0.4.35-qa-matmulpass
base: rocm-jaxlib-v0.4.35
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Rocm jaxlib v0.4.35 qa matmulpass #128
zoranjovanovic-ns
wants to merge
38
commits into
rocm-jaxlib-v0.4.35
from
rocm-jaxlib-v0.4.35-qa-matmulpass
+1,183
−10,697
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Passing amdgpu targets to crosstool wrapper which calls hipcc can restrict the kernels generated to specific set of supported amdgpu architectures.
Launch dimension should be of the form ((block.x, 1, 1), (thread.x, thready, 1)) to accommodate checks in (parallel_loop_emitter.cc)[https://github.com/openxla/xla/blob/main/xla/service/gpu/parallel_loop_emitter.cc#L169-L171]
[ROCm] Fix kernel launch dimension
ir_emitter/elemental_ir_emitter clean-up will follow. PiperOrigin-RevId: 691766033
…xla/service/gpu/te… Imported from GitHub PR openxla#19484 …sts:gpu_input_fusible_slice_test Copybara import of the project: -- 0d30738 by Dragan Mladjenovic <[email protected]>: [ROCm] Fix //xla/tests:complex_unary_op_test and //xla/service/gpu/tests:gpu_input_fusible_slice_test Merging this change closes openxla#19484 COPYBARA_INTEGRATE_REVIEW=openxla#19484 from ROCm:mlir_tests_new 0d30738 PiperOrigin-RevId: 698374588
Imported from GitHub PR openxla#19426 After this change to the test inputs openxla@b10653f "too many blocks" exception is not getting triggered anymore (shape is not big enough). Due to the low importance of the test, it was decided to disable it. Copybara import of the project: -- ee36ca0 by Milica Makevic <[email protected]>: Disable gpu_too_many_blocks_test for rocm Merging this change closes openxla#19426 COPYBARA_INTEGRATE_REVIEW=openxla#19426 from ROCm:disable_too_many_blocks_test ee36ca0 PiperOrigin-RevId: 697974812
…thm. Now when we have all the pieces of the puzzle for X3 algorithm we could easily add its equivalent for X6. PiperOrigin-RevId: 688294267
Imported from GitHub PR openxla#19342 Triton is currently disabled on ROCm. Skipping the following subtests in `dot_algorithms_test`: - TritonAlgorithmTest.Algorithm_BF16_BF16_F32_X3 - TritonAlgorithmTest.Algorithm_BF16_BF16_F32_X6 - TritonAlgorithmTest.Algorithm_TF32_TF32_F32 - TritonAlgorithmTest.Algorithm_TF32_TF32_F32_X3 - TritonAlgorithmTest.Algorithm_BF16_BF16_F32 Copybara import of the project: -- 32bd775 by Milica Makevic <[email protected]>: Disable unsupported Triton subtests Merging this change closes openxla#19342 COPYBARA_INTEGRATE_REVIEW=openxla#19342 from ROCm:disable_triton_tests 32bd775 PiperOrigin-RevId: 696740956
Rocm jaxlib v0.4.35 qa misc backport
Enable Triton Auto-tuning in XLA
…nSupportedExecutesCorrectlyForDot
Rocm jaxlib v0.4.35 qa triton cleanup
…nels Add NCCL_MAX_NCHANNELS env variable to multi gpu tests
Avoid lazy init of Blas handles, fix for non-canonical dots
Fixed issue with capturing local variable from lambda.
R0.4.35 fix test scripts
…ests-2 Fix Triton related tests
Add gfx1101 support to XLA
Respect hipruntime constraint on max worksize not to exceed int max
This change fixes the flaky gpu compiler test used to run on rocm CI pipeline gate. Triton pipeline was wrongly using the TritonGPUAccelerateMatmul pass which supports cuda only. In rocm there is a different pass which is now used in the rocm pipeline. by Alexandros Theodoridis
i-chaochen
approved these changes
Mar 11, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.