Skip to content

GH-46611: [Python][C++] Allow building float16 arrays without numpy #46618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

raulcd
Copy link
Member

@raulcd raulcd commented May 27, 2025

Rationale for this change

When we added Float16 we did not update pyarrow to be able to convert from Python objects to Arrow. Float16 required numpy and it crashed if numpy was not present.

What changes are included in this PR?

Allow to not require numpy to generate float16 scalars and arrays on pyarrow and do not fail if numpy is not present.

Are these changes tested?

Yes, new tests have been added

Are there any user-facing changes?

No changes for old functionality. Users will be allowed to use float16 without requiring to use np.float16 and directly from Python objects

@raulcd
Copy link
Member Author

raulcd commented May 27, 2025

@WillAyd @pitrou what would be your expectation here:

        arr = np.array([1.5, np.nan], dtype=np.float16)
        a = pa.array(arr, type=pa.float16())
        x, y = a.to_pylist()

isinstance(x, float) or isinstance(x, np.float16)?
Currently we convert to np.float16 but in my opinion we should return a Python Float otherwise we unnecessarily require numpy.

@pitrou
Copy link
Member

pitrou commented May 27, 2025

I agree a Python float should be returned. There is no reason for float16 to be different from float32 in that regard.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 28, 2025
@raulcd raulcd marked this pull request as ready for review May 28, 2025 09:58
@raulcd raulcd requested a review from AlenkaF as a code owner May 28, 2025 09:58
@raulcd raulcd requested review from pitrou and WillAyd May 28, 2025 09:58
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 28, 2025
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 28, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 28, 2025
@raulcd raulcd requested a review from pitrou May 28, 2025 14:19
@raulcd raulcd requested a review from rok May 28, 2025 14:19
@@ -87,15 +86,18 @@ NPY_INT_DECL(ULONGLONG, UInt64, uint64_t);

template <>
struct npy_traits<NPY_FLOAT16> {
typedef npy_half value_type;
typedef uint16_t value_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this could also be arrow::util::Float16, if that's easy to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is requiring quite a lot of changes around pyarrow/src/arrow/python/python_to_arrow.cc, pyarrow/src/arrow/python/arrow_to_pandas.cc and haven't been able to make it work yet. I would prefer to explore updating it on a different issue

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants