Woodwork Incorrectly Infers Boolean #1486

chukarsten · 2022-08-04T01:20:58Z

I would expect the following test to pass. We're seeing within concat_columns that when a DataFrame with a column with mixed null/integers is passed the Integer logical type during inference, the init fails. This is expected and an MR was put up to make concat_columns resilient to this. When we extended the test to cover Boolean/BooleanNullable, it was discovered that the init will impute the missing boolean value rather than error out that there was an attempted coercion to a non-nullable type.

I would expect that the following test would pass and also be extendable to Integer/IntegerNullable (and float64/Float64 when they're a thing).

import pytest
import numpy as np
@pytest.mark.parametrize("none_type", [None, np.nan, pd.NA])
@pytest.mark.parametrize("pass_logical_types", [True, False])
def test_boolean_inference(none_type, pass_logical_types):
    df = pd.DataFrame({"boolean": [none_type, True, False, True]})
    if pass_logical_types:
        with pytest.raises(Exception):
            # Would expect init to fail as you're trying to coerce a boolean to bool.
            df.ww.init(logical_types = {"boolean": Boolean})
    else:
        df.ww.init()
        assert isinstance(df.ww.logical_types["boolean"], BooleanNullable)

The text was updated successfully, but these errors were encountered:

jeff-hernandez · 2022-08-05T16:13:40Z

@chukarsten @ParthivNaresh pandas library has a new method called convert_dtypes in version 1.0.0 which can possibly provide better inference for nullable types. (docs)

from woodwork.logical_types import BooleanNullable
import pandas as pd
import numpy as np


for none_type in [None, np.nan, pd.NA]:
    # initial dtype is object
    series = pd.Series([none_type, True, True], dtype='object')

    # method infers dtype to boolean nullable
    inferred_dtype = series.convert_dtypes().dtype
    assert str(inferred_dtype) == BooleanNullable.primary_dtype

ParthivNaresh · 2022-08-05T16:34:49Z

@jeff-hernandez Wow nice catch! We should definitely explore this and see where we can use it. I'm thinking in EvalML if we need quick high level type inference we might be able to use this. In Woodwork we can use the extension concept they provided on top of the smarter inference we're doing for nulls now

chukarsten added the bug Something isn't working label Aug 4, 2022

ParthivNaresh mentioned this issue Aug 4, 2022

Fixed treatment of Boolean coercion for boolean column with nulls #1487

Closed

chukarsten mentioned this issue Aug 4, 2022

concat_columns handles DataFrames with differing rows #1485

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Woodwork Incorrectly Infers Boolean #1486

Woodwork Incorrectly Infers Boolean #1486

chukarsten commented Aug 4, 2022

jeff-hernandez commented Aug 5, 2022

ParthivNaresh commented Aug 5, 2022

Woodwork Incorrectly Infers Boolean #1486

Woodwork Incorrectly Infers Boolean #1486

Comments

chukarsten commented Aug 4, 2022

jeff-hernandez commented Aug 5, 2022

ParthivNaresh commented Aug 5, 2022