Skip to content
Discussion options

You must be logged in to vote

Is this decision made entirely at compile-time, or is there something in the runtime tensor metadata that affects it?

no, fully static decisions. you can have branches in your program to dispatch to different copy loops however.

Is there any performance difference between ld.global.v2.u64 and ld.global.v4.b32? They both read 128 bits, but are there cases where one would be preferable?

Not sure, but as long as they compile to the SASS it should not matter.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by N1GHTR0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants