Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean indexing doesn't work with subclass and TypeVar #1069

Open
davetapley opened this issue Dec 10, 2024 · 1 comment
Open

Boolean indexing doesn't work with subclass and TypeVar #1069

davetapley opened this issue Dec 10, 2024 · 1 comment

Comments

@davetapley
Copy link
Contributor

Describe the bug
A suggested on #908 (comment) I'm trying to use DataFrameT = TypeVar("DataFrameT", bound=DataFrame),
but with boolean indexing instead of a pipe.

To Reproduce

from typing import TypeVar, reveal_type

from pandas import DataFrame, Series


class SubDF(DataFrame):
    # https://pandas.pydata.org/pandas-docs/stable/development/extending.html#override-constructor-properties
    @property
    def _constructor(self):
        return SubDF

    @property
    def _constructor_sliced(self):
        return Series


sub = SubDF({'a': [1, 2, 3]})

DataFrameT = TypeVar("DataFrameT", bound=DataFrame)


def func(df: DataFrameT) -> DataFrameT:
    index = Series([True, False, True])

    df_ = df.loc[index]
    reveal_type(df_)

    return df_  # Type "DataFrame" is not assignable to return type "DataFrameT@func"


reveal_type(func(sub))

pyright:

  /workspaces/ng/repro.py:27:17 - information: Type of "df_" is "DataFrame"
  /workspaces/ng/repro.py:29:12 - error: Type "DataFrame" is not assignable to return type "DataFrameT@func"
    Type "DataFrame" is not assignable to type "DataFrameT@func" (reportReturnType)
  /workspaces/ng/repro.py:32:13 - information: Type of "func(sub)" is "SubDF"
1 error, 0 warnings, 2 informations 

Please complete the following information:

  • OS: Linux
  • OS Version 20.04.6
  • python version 3.12.2
  • version of type checker 1.1.390
  • version of installed pandas-stubs 2.2.3.241126

Additional context

Repro inspired by:

Might be same root cause as:

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Dec 10, 2024

It's not a .loc issue. You get a similar result with any DataFrame method that returns a DataFrame, e.g.:

def func2(df: DataFrameT) -> DataFrameT:
    df_ = df.query("x <= 10")

    return df_   # Type "DataFrame" is not assignable to return type "DataFrameT@func"

reveal_type(func2(sub))

The type revealed is still correct (SubDF) in this case.

That particular example can be fixed by changing query() to return Self instead of DataFrame. I imagine we'd have to do that with any of the methods in class DataFrame that currently return DataFrame - i.e., change them to return Self

But loc is different, because it is returning the class _LocIndexerFrame, so I think that latter class would have to become generic with Self passed in as a parameter, so it is a subclass of Generic[_T]

I tried this idea and it worked on your example.

PR with tests welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants