-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Description
While inspecting the code, I noticed a small bug in the buffer protocol implementation - the buffer's .shape is equal to num_tokens * buffer.itemsize, but instead it should be num_tokens to ensure math.prod(buffer.shape) * buffer.itemsize == len(buffer) afterward, as per the official spec. This then leads to memoryview(buffer).tolist() returning an incorrect result. Luckily, NumPy ignores the .shape (unlike CPython) and builds the result using len(buffer) // buffer.itemsize.
Even though the encode_to_tiktoken_buffer API is somewhat private, this bug is probably still worth fixing to follow the spec ๐.
Metadata
Metadata
Assignees
Labels
No labels