Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST(string dtype): Resolve xfail in test_base.py #60713

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@rhshadrach rhshadrach added Testing pandas testing functions or related to the test suite Strings String extension data type and string data ExtensionArray Extending pandas with custom dtypes or arrays. Copy / view semantics labels Jan 13, 2025
@rhshadrach rhshadrach added this to the 2.3 milestone Jan 13, 2025
Comment on lines +537 to +538
if dtype is not None:
raise TypeError("Cannot change data-type for string array.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any dtypes we do want to accept here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - the point of NumPy's view method seems incompatible with the string data types (both object-based and Arrow-based)

def view(self, dtype: Dtype | None = None) -> ArrayLike:
if dtype is not None:
raise TypeError("Cannot change data-type for string array.")
return super().view(dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I noticed NumPy's documentation on view says:

Passing None for dtype is different from omitting the parameter, since the former invokes dtype(None) which is an alias for dtype('float64').

So I'm assuming we overload the meaning of dtype=None in our extension class hierarchy somewhere, otherwise this would fail (?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by "otherwise this would fail". I think we have behavior that differs from NumPy here:

if dtype is not None:
raise NotImplementedError(dtype)
return self[:]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK good to know we override. It wouldn't make sense otherwise to reinterpret these bytes using NumPy's default behavior (float)

@rhshadrach rhshadrach marked this pull request as ready for review January 14, 2025 02:41
@@ -533,6 +533,11 @@ def _str_map_nan_semantics(
else:
return self._str_map_str_or_object(dtype, na_value, arr, f, mask)

def view(self, dtype: Dtype | None = None) -> ArrayLike:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding this here, the only thing it does is change the error from NotImplementedError to TypeError compared to the implementation in the base class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for non-Arrow StringArray, this avoids getting the NDArrayBackedExtensionArray.view implemetation, which has different behaviour compared to the base class Extensionarray.view

@WillAyd WillAyd merged commit 8bc8c0a into pandas-dev:main Jan 14, 2025
59 checks passed
@WillAyd
Copy link
Member

WillAyd commented Jan 14, 2025

Thanks @rhshadrach

Copy link

lumberbot-app bot commented Jan 14, 2025

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.3.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 8bc8c0a6119b053e520f5018dc1350863f7277e4
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #60713: TST(string dtype): Resolve xfail in test_base.py'
  1. Push to a named branch:
git push YOURFORK 2.3.x:auto-backport-of-pr-60713-on-2.3.x
  1. Create a PR against branch 2.3.x, I would have named this PR:

"Backport PR #60713 on branch 2.3.x (TST(string dtype): Resolve xfail in test_base.py)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

rhshadrach added a commit to rhshadrach/pandas that referenced this pull request Jan 20, 2025
@rhshadrach
Copy link
Member Author

rhshadrach commented Jan 20, 2025

Backport PR: #60742

mroeschke pushed a commit that referenced this pull request Jan 21, 2025
…60742)

* Backport PR #60615: TST(string dtype): Resolve some HDF5 xfails

* Backport PR #60713: TST(string dtype): Resolve xfail in test_base.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics ExtensionArray Extending pandas with custom dtypes or arrays. Strings String extension data type and string data Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants