Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Woodwork Incorrectly Infers Boolean #1486

Open
chukarsten opened this issue Aug 4, 2022 · 2 comments
Open

Woodwork Incorrectly Infers Boolean #1486

chukarsten opened this issue Aug 4, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@chukarsten
Copy link
Contributor

I would expect the following test to pass. We're seeing within concat_columns that when a DataFrame with a column with mixed null/integers is passed the Integer logical type during inference, the init fails. This is expected and an MR was put up to make concat_columns resilient to this. When we extended the test to cover Boolean/BooleanNullable, it was discovered that the init will impute the missing boolean value rather than error out that there was an attempted coercion to a non-nullable type.

I would expect that the following test would pass and also be extendable to Integer/IntegerNullable (and float64/Float64 when they're a thing).

import pytest
import numpy as np
@pytest.mark.parametrize("none_type", [None, np.nan, pd.NA])
@pytest.mark.parametrize("pass_logical_types", [True, False])
def test_boolean_inference(none_type, pass_logical_types):
    df = pd.DataFrame({"boolean": [none_type, True, False, True]})
    if pass_logical_types:
        with pytest.raises(Exception):
            # Would expect init to fail as you're trying to coerce a boolean to bool.
            df.ww.init(logical_types = {"boolean": Boolean})
    else:
        df.ww.init()
        assert isinstance(df.ww.logical_types["boolean"], BooleanNullable)
@jeff-hernandez
Copy link
Contributor

@chukarsten @ParthivNaresh pandas library has a new method called convert_dtypes in version 1.0.0 which can possibly provide better inference for nullable types. (docs)

from woodwork.logical_types import BooleanNullable
import pandas as pd
import numpy as np


for none_type in [None, np.nan, pd.NA]:
    # initial dtype is object
    series = pd.Series([none_type, True, True], dtype='object')

    # method infers dtype to boolean nullable
    inferred_dtype = series.convert_dtypes().dtype
    assert str(inferred_dtype) == BooleanNullable.primary_dtype 

@ParthivNaresh
Copy link
Collaborator

@jeff-hernandez Wow nice catch! We should definitely explore this and see where we can use it. I'm thinking in EvalML if we need quick high level type inference we might be able to use this. In Woodwork we can use the extension concept they provided on top of the smarter inference we're doing for nulls now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants