Fix normalization of np.str_ and np.bytes_ types #2827
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #2800
What does this implement or fix?
When you try to write a DataFrame with
np.str_values, you get this error:The issue is in
_accept_array_string()- it checkstype(v) in (str, bytes)which does an exact type match. Butnp.str_is its own class that inherits fromstr, so the check fails even though it is basically a string.Switched to
isinstance(v, (str, bytes))which handles subclasses properly. Same thing forcoerce_string_column_to_fixed_length_array()- changedto_type == strtoissubclass(to_type, str).Any other comments?
Quick sanity check that this makes sense:
The C++ side already does the right thing -
PyUnicode_CheckandPyBytes_Checkboth check the type hierarchy, sonp.str_values work fine once they get past the Python normalization.Added some tests to cover this.
Checklist
Checklist for code changes...