-
Notifications
You must be signed in to change notification settings - Fork 19
Fixed StridedBufferView::to_numpy on Metal devices #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fixed 4 elements alignment issue on Metal devices.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This reverts commit ff18b9b.
Fix the issue in to_ndarra instead. This version should be more efficient and flexible.
@kaizhangNV This new fix should solve the issue in a more elegant way. 3xn matrix buffers can be converted to numpy and torch correctly on macOS now. However, due to the alignment on metal devices, a continuous StridedBuffer will be converted to a non-continuous ndarray. And I don't have a clear idea of how to solve this issue. One possible simple solution would be removing This is a demonstration of what would happen if we copy a numpy array to a float3x3 buffer on macOS currently: |
Fixed matrix alignment stride by copying the data from ndarray to a temperary buffer.
This comment was marked as outdated.
This comment was marked as outdated.
As I mentioned in discord, the fix should not be in SGL. As the alignment on Metal is known to be different from other platforms. So we recommend developers to use shader cursor to write/read data to GPU. We provide helper functions for array and vector, but not matrix, but I think's it's trivial to just extend. We should always avoid such special case handling on metal, as shader cursor already handles those alignment issue correctly. And users should already avoid using raw data directly. |
I've been thinking on this one for the past couple of days. The intelligence of these to_numpy style functions is the first thing we need to decide. There's a perfectly valid argument for saying they should detect when (regardless of platform) the user is attempting to do something that isn't a simple memcpy, and throw an exception explaining that the user would need to use BufferCursor to copy effectively. The flip side is that you could argue our goal should be to be as 'cross platform' as possible, and if it works on one platform, we should do everything possible to make it work on the others. If we were to address it in to_numpy, the proposed fixes are probably not the right ones. I'd suggest that we should fall back on a reflection based mechanism (potentially using buffer cursor) for this, as we can then aim to handle all none-trivial use cases. For now I suggest we keep this branch around for reference, and modify the 2 functions to detect when the copy would return invalid data and throw an exception accordingly. |
I've changed the to_numpy fix to reflection based mechanism this Monday. Now it's using buffer_type_layout to reflect the stride: 8e12752 I think the current version of to_numpy is doing what you proposed (using buffer cursor for reflection is unnecessary here as the buffer_desc already contains the layout reflection needed for stride calculation) The only issue is copy_from_numpy is a little bit tricky to fix. Because it doesn't support non-continuous copying at all and all it's doing is just memcpy the entire buffer. My current fix for that is a little bit wacky but it's still migrated to reflection mechanism instead of platform dependent code. You can take a look at my commits this Monday and decide if it's a good idea to do it (I think to_numpy fix is ready to go, but copy_from_numpy still needs more work) possible One thing I thought about fixing |
@kaizhangNV - Can we have another review pass here? |
* Buffer+tensor use full type name for signature * Try fixing pyright issue with declrefs
I would still rather we favour simply throwing an exception for invalid operations, and to detect it efficiently. Whilst I can see the logic in where we've got, the suggested modifications will make all construction of buffers and tensors more expensive and complex, to handle float3x3 on Mac. That doesn't seem like a good trade off to me. |
A draft fix for matrix buffer alignment issue mentioned in #206
This is an extremely naive implementation and might be slow as the buffer is copied in a for loop. Fix for
copy_from_numpy
is not implemented for now. This matrix alignment issue is also impactingBufferCursor.to_numpy
and potentiallyBuffer.to_numpy()
as well. A general fix forto_numpy
alignment issue on Metal is required in the future. I'll keep looking into this issue.The
to_numpy()
now produce correct 3x3 matrix buffers: