Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache swizzled tensor for tuning #1686

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

Serge45
Copy link
Collaborator

@Serge45 Serge45 commented Feb 20, 2025

  • Cache swizzled tensor according to its datatype and size, to avoid repeating host side swizzles.
  • An LRUCache was introduced for balancing memory usage and performance.
  • Re-layout will be skipped if validation disabled.

jichangjichang
jichangjichang previously approved these changes Feb 21, 2025
//TODO: Support more swizzling type, such as 32x32x8, currently we have 16x16x8 only.
if(needSwizzle)

//if no validation, skip the swizzle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we don't need to permute the tensor when no validation. But is it correct to skip the padding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some experiments on swizzled and non-swizzled on STA problem, I observed that there's performance disparity. In the latest commit, it uses LRUCache to manage the cached tensor for balancing memory usage and runtime performance, and now it always performs swizzle for STA/STB problem.

@Serge45 Serge45 force-pushed the feature/tensilelite-cache-swizzle branch from 0120521 to 4948354 Compare February 24, 2025 09:37
@geotseng-amd geotseng-amd self-requested a review February 26, 2025 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants