Fixed StridedBufferView::to_numpy on Metal devices #207

fangjunzhou · 2025-05-10T13:58:01Z

A draft fix for matrix buffer alignment issue mentioned in #206

This is an extremely naive implementation and might be slow as the buffer is copied in a for loop. Fix for copy_from_numpy is not implemented for now. This matrix alignment issue is also impacting BufferCursor.to_numpy and potentially Buffer.to_numpy() as well. A general fix for to_numpy alignment issue on Metal is required in the future. I'll keep looking into this issue.

The to_numpy() now produce correct 3x3 matrix buffers:

(base) ~/Documents/stanford/bvhgs (main ✗) pytest tests --log-cli-level=DEBUG --benchmark-skip
============================================================================================================= test session starts ==============================================================================================================
platform darwin -- Python 3.12.9, pytest-8.3.5, pluggy-1.5.0
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/fangjun/Documents/stanford/bvhgs
configfile: pyproject.toml
plugins: anyio-4.9.0, benchmark-5.1.0
collecting ...
------------------------------------------------------------------------------------------------------------- live log collection --------------------------------------------------------------------------------------------------------------
INFO     bvhgs:__init__.py:9 Slang device created: Device(
  type = metal,
  adapter_name = "default",
  adapter_luid = 00000000000000000000000000000000,
  enable_debug_layers = false,
  supported_shader_model = sm_6_0,
  shader_cache_enabled = false,
  shader_cache_path = ""
)
collected 11 items

tests/math/test_quaternion.py::test_quaternion_multiply[4] PASSED                                                                                                                                                                        [  9%]
tests/math/test_quaternion.py::test_quaternion_conjugate[4] PASSED                                                                                                                                                                       [ 18%]
tests/math/test_quaternion.py::test_quaternion_inverse[4] PASSED                                                                                                                                                                         [ 27%]
tests/math/test_quaternion.py::test_quaternion_from_axis_angle[4] PASSED                                                                                                                                                                 [ 36%]
tests/math/test_quaternion.py::test_quaternion_to_axis_angle[4] PASSED                                                                                                                                                                   [ 45%]
tests/math/test_quaternion.py::test_rotate_vector[4] PASSED                                                                                                                                                                              [ 54%]
tests/math/test_quaternion.py::test_quaternion_as_rotation_matrix[4]
---------------------------------------------------------------------------------------------------------------- live log call -----------------------------------------------------------------------------------------------------------------
DEBUG    test_quaternion:test_quaternion.py:206 [[[-0.43706563 -0.7254428   0.53170127]
  [ 0.754113    0.02661575  0.65620506]
  [-0.49019092  0.6877675   0.5354331 ]]

 [[ 0.87763655 -0.04220945  0.47746468]
  [ 0.41413367  0.5683256  -0.71098477]
  [-0.24134511  0.8217204   0.51626366]]

 [[-0.55375576 -0.4672291  -0.68923986]
  [ 0.07288799 -0.8517591   0.51883894]
  [-0.8294829   0.23707275  0.5057219 ]]

 [[ 0.03341324 -0.91043794  0.41229412]
  [ 0.9988376   0.04476     0.01789208]
  [-0.03474391  0.41121697  0.9108751 ]]]
DEBUG    test_quaternion:test_quaternion.py:207 [[[-0.43706562 -0.72544289  0.53170129]
  [ 0.75411305  0.02661575  0.65620509]
  [-0.49019094  0.68776756  0.53543312]]

 [[ 0.87763651 -0.04220944  0.47746467]
  [ 0.41413366  0.56832555 -0.7109848 ]
  [-0.2413451   0.82172041  0.5162636 ]]

 [[-0.55375578 -0.46722909 -0.68923981]
  [ 0.072888   -0.85175905  0.51883895]
  [-0.8294829   0.23707276  0.50572189]]

 [[ 0.03341322 -0.91043788  0.41229409]
  [ 0.99883753  0.04475997  0.01789208]
  [-0.0347439   0.41121698  0.9108751 ]]]
PASSED                                                                                                                                                                                                                                   [ 63%]
tests/math/test_quaternion.py::test_quaternion_as_rotation_matrix_benchmark[1024] SKIPPED (Skipping benchmark (--benchmark-skip active).)                                                                                                [ 72%]
tests/math/test_quaternion.py::test_quaternion_as_rotation_matrix_benchmark[2048] SKIPPED (Skipping benchmark (--benchmark-skip active).)                                                                                                [ 81%]
tests/math/test_quaternion.py::test_quaternion_as_rotation_matrix_benchmark[4096] SKIPPED (Skipping benchmark (--benchmark-skip active).)                                                                                                [ 90%]
tests/math/test_quaternion.py::test_quaternion_as_rotation_matrix_benchmark[8192] SKIPPED (Skipping benchmark (--benchmark-skip active).)                                                                                                [100%]

========================================================================================================= 7 passed, 4 skipped in 0.87s =========================================================================================================

Fixed 4 elements alignment issue on Metal devices.

This reverts commit ff18b9b.

Fix the issue in to_ndarra instead. This version should be more efficient and flexible.

fangjunzhou · 2025-05-11T00:19:23Z

@kaizhangNV This new fix should solve the issue in a more elegant way. 3xn matrix buffers can be converted to numpy and torch correctly on macOS now. However, due to the alignment on metal devices, a continuous StridedBuffer will be converted to a non-continuous ndarray. And copy_from_numpy requires both ndarray and buffer to be continuous in current implementation. This can be problematic for us to refactor fix it with current implementation.

I don't have a clear idea of how to solve this issue. One possible simple solution would be removing SGL_CHECK(is_ndarray_contiguous(data), "Source Numpy array must be contiguous"); as the underlying buffer is not continuous anyway. But this requires the users to allocate matrix aligned numpy buffer manually on different platform which is confusing. Another idea is we implement a new version of copy_from_numpy with strided numpy array support from ground up and we implement platform specific matrix alignment inside slangpy_ext. This could also make it easier to send numpy array slice to StridedBufferView.

This is a demonstration of what would happen if we copy a numpy array to a float3x3 buffer on macOS currently:

Fixed matrix alignment stride by copying the data from ndarray to a temperary buffer.

kaizhangNV · 2025-05-12T02:12:47Z

As I mentioned in discord, the fix should not be in SGL. As the alignment on Metal is known to be different from other platforms. So we recommend developers to use shader cursor to write/read data to GPU.

We provide helper functions for array and vector, but not matrix, but I think's it's trivial to just extend.

We should always avoid such special case handling on metal, as shader cursor already handles those alignment issue correctly. And users should already avoid using raw data directly.

…numpy" This reverts commit 3f9d5bd.

This reverts commit 25c9e9f.

… on metal device

ccummingsNV · 2025-05-15T16:45:45Z

I've been thinking on this one for the past couple of days. The intelligence of these to_numpy style functions is the first thing we need to decide. There's a perfectly valid argument for saying they should detect when (regardless of platform) the user is attempting to do something that isn't a simple memcpy, and throw an exception explaining that the user would need to use BufferCursor to copy effectively.

The flip side is that you could argue our goal should be to be as 'cross platform' as possible, and if it works on one platform, we should do everything possible to make it work on the others.

If we were to address it in to_numpy, the proposed fixes are probably not the right ones. I'd suggest that we should fall back on a reflection based mechanism (potentially using buffer cursor) for this, as we can then aim to handle all none-trivial use cases.

For now I suggest we keep this branch around for reference, and modify the 2 functions to detect when the copy would return invalid data and throw an exception accordingly.

fangjunzhou · 2025-05-15T17:00:26Z

I've changed the to_numpy fix to reflection based mechanism this Monday. Now it's using buffer_type_layout to reflect the stride: 8e12752

I think the current version of to_numpy is doing what you proposed (using buffer cursor for reflection is unnecessary here as the buffer_desc already contains the layout reflection needed for stride calculation)

The only issue is copy_from_numpy is a little bit tricky to fix. Because it doesn't support non-continuous copying at all and all it's doing is just memcpy the entire buffer. My current fix for that is a little bit wacky but it's still migrated to reflection mechanism instead of platform dependent code.

You can take a look at my commits this Monday and decide if it's a good idea to do it (I think to_numpy fix is ready to go, but copy_from_numpy still needs more work)

possible copy_from_numpy fix

One thing I thought about fixing copy_from_numpy is this should be fairly easy in numpy. If we have a ndarray of size (b1, b2, n, 3), we should just expand the size to (b1, b2, n, 4) for alignment and copy the original buffer to the slice [:, :, :, :3] in this expanded array. The underlying buffer should now be correct. However, I'm not familiar with nanobind and I didn't find any document about how to do array slicing read and write in nanobind ndarray. If we are implementing sliced memcpy in c++, it might be a lot of work and we need to take a look at how numpy implement it.

mkeshavaNV · 2025-05-21T11:17:53Z

@kaizhangNV - Can we have another review pass here?

* Buffer+tensor use full type name for signature * Try fixing pyright issue with declrefs

ccummingsNV · 2025-06-05T11:20:25Z

I would still rather we favour simply throwing an exception for invalid operations, and to detect it efficiently. Whilst I can see the logic in where we've got, the suggested modifications will make all construction of buffers and tensors more expensive and complex, to handle float3x3 on Mac. That doesn't seem like a good trade off to me.

Fixed StridedBufferView::to_numpy on Metal devices

1b2ec24

Fixed 4 elements alignment issue on Metal devices.

This comment was marked as outdated.

Sign in to view

fangjunzhou added 2 commits May 10, 2025 17:03

Revert "Fixed StridedBufferView::to_numpy on Metal devices"

2915a51

This reverts commit ff18b9b.

Fixed StridedBufferView matrix alignment issue on Metal devices

b95db93

Fix the issue in to_ndarra instead. This version should be more efficient and flexible.

Fixed StridedBufferView matrix alignment issue for copy_from_numpy

587332c

Fixed matrix alignment stride by copying the data from ndarray to a temperary buffer.

This comment was marked as outdated.

Sign in to view

fangjunzhou marked this pull request as ready for review May 11, 2025 01:03

fangjunzhou marked this pull request as draft May 12, 2025 03:42

fangjunzhou added 5 commits May 13, 2025 00:02

Revert "Fixed StridedBufferView matrix alignment issue for copy_from_…

b2f0fb4

…numpy" This reverts commit 3f9d5bd.

Revert "Fixed StridedBufferView matrix alignment issue on Metal devices"

0762b70

This reverts commit 25c9e9f.

Calculate to_numpy dtype_stride using element type layout reflection

8e12752

Convert the returned ndarray as contiguous as the stride might be off…

5e1756d

… on metal device

Fix StridedBufferView::copy_from_numpy without platform specific macro

ca156a9

ccummingsNV mentioned this pull request May 15, 2025

float3x3 Allocates 4x4 Buffers on Metal Device #206

Closed

oliver-batchelor pushed a commit to oliver-batchelor/slangpy that referenced this pull request Jun 3, 2025

Buffer+tensor use full type name for signature (shader-slang#207)

e13229f

* Buffer+tensor use full type name for signature * Try fixing pyright issue with declrefs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed StridedBufferView::to_numpy on Metal devices #207

Fixed StridedBufferView::to_numpy on Metal devices #207

fangjunzhou commented May 10, 2025

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

fangjunzhou commented May 11, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

kaizhangNV commented May 12, 2025

Uh oh!

ccummingsNV commented May 15, 2025

Uh oh!

fangjunzhou commented May 15, 2025 •

edited

Loading

Uh oh!

mkeshavaNV commented May 21, 2025

Uh oh!

ccummingsNV commented Jun 5, 2025

Uh oh!

Uh oh!

Fixed StridedBufferView::to_numpy on Metal devices #207

Are you sure you want to change the base?

Fixed StridedBufferView::to_numpy on Metal devices #207

Conversation

fangjunzhou commented May 10, 2025

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

fangjunzhou commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

kaizhangNV commented May 12, 2025

Uh oh!

ccummingsNV commented May 15, 2025

Uh oh!

fangjunzhou commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkeshavaNV commented May 21, 2025

Uh oh!

ccummingsNV commented Jun 5, 2025

Uh oh!

Uh oh!

fangjunzhou commented May 11, 2025 •

edited

Loading

fangjunzhou commented May 15, 2025 •

edited

Loading