Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST(string dtype): Resolve xfail in groupby.test_size #60711

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

groupby does inference on the group labels across the board.

df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]}, dtype="object")
gb = df.groupby("a")
result = gb.sum()
print(result.index.dtype)
# int64

While I agree long-term I'd prefer to preserve object dytpe, I do not think we should be changing this at this point.

@rhshadrach rhshadrach added Testing pandas testing functions or related to the test suite Groupby Strings String extension data type and string data labels Jan 13, 2025
@rhshadrach rhshadrach added this to the 2.3 milestone Jan 13, 2025
expected = Series(
[2, 1],
index=Index(["a", "b"], name="a", dtype=dtype),
index=Index(["a", "b"], name="a", dtype=exp_index_dtype),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't it work if you just remove the dtype argument and let the constructor infer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - this was introduced in #55627 but I do not see why if the values are string[pyarrow] that the result would be Int64.

cc @phofl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach the Int64 is for exp_dtype on the line below, not for the dtype of the Index being constructed on this line, so I am not entirely understanding your comment/question ?
(the construction of exp_dtype is not being touched in this PR)

@rhshadrach rhshadrach marked this pull request as draft January 14, 2025 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Strings String extension data type and string data Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants