int64 indexing and max grid size #370
iclementine
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Indxing & Task partitioning in Triton jit functions
In triton kernels, we compose indices & strides to compute the offset of elements in a tensor, then add the offset to the data pointer of the tensor to get pointers to these elements. Almost everything starts from
tl.program_id
,tl.arange
andtl.num_programs
.As to the task partitioning, the most-common example is to use a ceil-div pattern. Each CTA takes a tile of data, and the number of CTAs to process the whole Tensor is the
cdiv(size, TILE_SIZE)
.Almost everything tutorial tells us to write triton jit functions like this, but in practice you have to take care of a lot of other stuffs.
Q1: Will
offsets
exceed the limit of int32 indexing? Yes, it would. Actuallyint32
has the max value2147483647
, which is2 * 1024 * 1024 * 1024 - 1
, which means a contiguuos tensor with more than2 * 1024 * 1024 * 1024
elements would makeoffsets
exceed the limit and overflow to some negative value and cause incorrect memory address to access. If this is a tensor with arbitrary stride, things would be worse.But how to avoid this?
base_ptr + offset1 * stride1 + offset2 * stride2 + ...
would avoid this, since the first addition promotes the second argument to 64-bit, and so do the consecutive additions.2 * 1024 * 1024 * 1024 - 1
. If none of them has such a large offset, use int32 index as shown above, otherwise, cast the result oftl.program_id
,tl.arange
andtl.num_programs
to int64 from the beginning.Q2: Will
triton.cdiv(N, TILE_SIZE)
exceed the limit of int32 indexing? Yes. In nvidia gpus, the max grid size in thex
dimension is also2147483647
. So you should also take care not to make a super-large grid.Answer:
We want the best-practice to be follow by triton developers, but taking care of indexing code in triton jit functions and task partitioning is sometimes tedious and not everyone follows them. Do you have any good ideas?
Beta Was this translation helpful? Give feedback.
All reactions