Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Refined transpose kernel configurations for CPU target & down-sized some copy-op benchmarks #485

Merged

Conversation

OuadiElfarouki
Copy link
Collaborator

This patch updates the kernel configurations of both _transpose_outplace & _transpose_add for default CPU targets resulting in an overall increase in performance (benchmark numbers shared internally).
The PR also includes some refactoring of omatcopy/omatadd batched operators benchmarks to address the time consuming ones.

Copy link
Collaborator

@s-Nick s-Nick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @OuadiElfarouki LGTM

@muhammad-tanvir-1211 muhammad-tanvir-1211 merged commit 39e3747 into codeplaysoftware:master Dec 19, 2023
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants