Fix normalization of np.str_ and np.bytes_ types #2827

yashwantbezawada · 2025-12-27T09:09:09Z

Reference Issues/PRs

What does this implement or fix?

When you try to write a DataFrame with np.str_ values, you get this error:

ArcticDbNotYetImplemented: Failed to normalize column 'col' with dtype 'object'. 
Found first non-null value of type '<class 'numpy.str_'>', but only strings, unicode, 
and Timestamps are supported.

The issue is in _accept_array_string() - it checks type(v) in (str, bytes) which does an exact type match. But np.str_ is its own class that inherits from str, so the check fails even though it is basically a string.

Switched to isinstance(v, (str, bytes)) which handles subclasses properly. Same thing for coerce_string_column_to_fixed_length_array() - changed to_type == str to issubclass(to_type, str).

Any other comments?

Quick sanity check that this makes sense:

>>> isinstance(np.str_("hello"), str)
True
>>> isinstance(np.bytes_(b"hello"), bytes)
True

The C++ side already does the right thing - PyUnicode_Check and PyBytes_Check both check the type hierarchy, so np.str_ values work fine once they get past the Python normalization.

Added some tests to cover this.

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

The normalization code was using exact type checks (type(v) in (str, bytes)) which failed for numpy string scalar types since np.str_ and np.bytes_ are distinct classes that inherit from str and bytes respectively. Changed to use isinstance() checks which properly handle the type hierarchy, allowing numpy string types to be normalized correctly. Also updated coerce_string_column_to_fixed_length_array to use issubclass() for consistent handling when dynamic_strings=False. Added regression tests for the fix. Fixes man-group#2800 Signed-Off By: Yashwant Bezawada <[email protected]>. By including this sign-off line I agree to the terms of the Contributor License Agreement.

IvoDD

Thank you for contributing! Looks good! Just one suggestion for an extra test

python/arcticdb/version_store/_normalization.py

Per reviewer feedback referencing issue man-group#704, using isinstance() to accept all str/bytes subclasses has been problematic before. Changed to use strict type equality with explicit numpy types: - type(v) in (str, bytes, np.str_, np.bytes_) - to_type in (str, np.str_) This supports numpy string types while avoiding issues with arbitrary string subclasses.

yashwantbezawada requested review from IvoDD, alexowens90 and poodlewars as code owners December 27, 2025 09:09

phoebusm force-pushed the fix-numpy-str-normalization branch from 4b2dc27 to 2380eea Compare December 29, 2025 18:30

IvoDD reviewed Dec 30, 2025

View reviewed changes

python/arcticdb/version_store/_normalization.py Outdated Show resolved Hide resolved

IvoDD requested changes Dec 30, 2025

View reviewed changes

python/arcticdb/version_store/_normalization.py Outdated Show resolved Hide resolved

IvoDD approved these changes Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix normalization of np.str_ and np.bytes_ types #2827

Fix normalization of np.str_ and np.bytes_ types #2827

Uh oh!

yashwantbezawada commented Dec 27, 2025 •

edited

Loading

Uh oh!

IvoDD left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix normalization of np.str_ and np.bytes_ types #2827

Are you sure you want to change the base?

Fix normalization of np.str_ and np.bytes_ types #2827

Uh oh!

Conversation

yashwantbezawada commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Uh oh!

IvoDD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yashwantbezawada commented Dec 27, 2025 •

edited

Loading