Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support structured datatypes #1164

Open
manopapad opened this issue Jan 17, 2025 · 4 comments
Open

Support structured datatypes #1164

manopapad opened this issue Jan 17, 2025 · 4 comments

Comments

@manopapad
Copy link
Contributor

Reported by @elliottslaughter

When you save a complex-typed array to an HDF5 file in MATLAB, it uses the structured datatype [('real', '<f8'), ('imag', '<f8')] instead of complex128. If you globally replace NumPy with cuPyNumeric, now that HDF5 file cannot be read into an array, because at conversion time cuPyNumeric doesn't allow this datatype:

TypeError: cuPyNumeric does not support dtype=[('real', '<f8'), ('imag', '<f8')]

We should at least be accepting this datatype, even if we fall back to NumPy for it. Then we can re-enter accelerated territory once we convert to np.complex128.

@seberg
Copy link
Contributor

seberg commented Jan 17, 2025

Can you easily create a cupynumeric array backed by an equivalent StructType that corresponds to the NumPy dtype? I imagine users can get very far even if the only operations that work are:

  • arr["real"] needs to fetch a view into the real part.
  • arr.view(np.complex128) here needs to work.

NumPy additional supports comparisons and casts on/between these. My guess would be that you can get quite far with just those in practice (including custom kernels that actually ingest the structured dtype).

In NumPy arr[0] returns a mutable scalar (very awkward but it allows arr[0]["field"] = value). But I think cupynumeric always sticks with 0-D arrays already, so that shouldn't be a problem.

@magnatelee
Copy link
Contributor

Can you easily create a cupynumeric array backed by an equivalent StructType that corresponds to the NumPy dtype?

Yes, that's trivial. StructType even has to_numpy_dtype() that converts itself to NumPy's struct type: https://github.com/nv-legate/legate.internal/blob/main/src/python/legate/core/_lib/type/type_info.pyx#L623-L655 it might be nice to teach from_np_dtype() how to do the inverse (it only supports primitive types at the moment): https://github.com/nv-legate/legate.internal/blob/main/src/python/legate/core/_lib/type/type_info.pyx#L434-L451

arr["real"] needs to fetch a view into the real part.

We need to make two changes for this:

(1) We should keep track of field names and offsets for DeferredArrays when they are constructed from field slicing
(2) We should pass the field offset when creating an accessor in a cuPyNumeric task. Legion already has this support: https://github.com/StanfordLegion/legion/blob/stable/runtime/legion.h#L2823. We should also update the dense check in some of the tasks as well.

arr.view(np.complex128) here needs to work.

reinterpret_as() has been added to the Legate API recently, so implementing ndarray.view() should be straightforward. https://github.com/nv-legate/legate.internal/blob/main/src/cpp/legate/data/logical_store.h#L159-L182

NumPy additional supports comparisons and casts on/between these. My guess would be that you can get quite far with just those in practice (including custom kernels that actually ingest the structured dtype).

How does NumPy implement those operators? the easiest way I can think of is doing them field-wise and aggregating the results.

In NumPy arr[0] returns a mutable scalar (very awkward but it allows arr[0]["field"] = value). But I think cupynumeric always sticks with 0-D arrays already, so that shouldn't be a problem.

in fact, there's no easy way to support that unfortunately. in the way ndarray is currently set up in cupynumeric, arr[0]["field"] = value will try to assign the value to the field of a temporary 0D array constructed from arr[0], as the arr[0] expression will call the __getitem__() meta-method that returns a fresh array.

@magnatelee
Copy link
Contributor

@manopapad given that we have enough building blocks to support structured types, let's scope the work and assign it to someone.

@seberg
Copy link
Contributor

seberg commented Jan 17, 2025

the easiest way I can think of is doing them field-wise and aggregating the results.

Yes, nothing much to it:

  • Comparisons are just field wise + aggregating the result (non-ideally the C code is the same that you would write in Python).
  • Casts are basically the same, but do this at a lower level (in legate terms, basically within the task).

For memory access optimization, a lower-level approach is better of course, since you would want to do the operation in cache-friendly chunks.

One thing that comparisons in the NumPy case does include is that we support sorting/searchsorted. But in the end even comparisons are probably something that users rarely need or can work around if missing, but comparisons are also pretty easy to implement if you don't care mostly about support.

of a temporary 0D array constructed from arr[0]

I think this is completely fine. If possible, it may help users if you can make that 0-D array read-only and raise a helpful error message on why. (It might also end up being more trouble than worth it, although I suspect it is good.)


Another thing that NumPy supports are subarray dtypes (i.e. a field that contains a C-order array of the dtype).
For them, I suppose it may be again nice to support them (if only for arr["field"]). The weird thing that NumPy does is that if you have:

arr = np.ones(10, dtype="(2)i,i")
arr["f0"]

Then arr["f0"] "integrates" the subarray dimensions to make its shape (10, 2).

@manopapad manopapad changed the title Not accepting structured datatypes Support structured datatypes Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants