Confusion about domain_label_colname in tabletshift/core/features.py #8

lisa-lthorrold · 2024-01-18T11:20:33Z

Describe the bug
There is some confusion about domain_label_colname in tabletshift/core/features.py. What is it's purpose, and how is it different from domain_split_varname?

Is there a reason it is not added in the self.get_passthrough_columns call? In get_passthrough_columns it seems this is an optional attribute, but it is only being called from one place.

In any case, without it being added, the columns in the datasets are transformed (one hot coded or binned), and the column names are adjusted accordingly. At the point this code is being run, domain_label_colname == domain_label_varname

If domain_label_colname is a categorical attribute (as it's the case for anes dataset) then the transformed data butchers it's column name, so by the time this code is called straight after:

if domain_label_colname:
           # Case: fit the domain label transformer and apply it.
           transformed.loc[:, domain_label_colname] = \
               self.fit_transform_domain_labels(
                   transformed.loc[:, domain_label_colname])

we have exception, as the column name no longer exists (4 new columns with an extended version of that name is present). In the diabetes readmission dataset, the column which is domain_label_column is an int, so it retrains its column name when this code is called, and no exception is thrown.

    # Fit the feature transformer and apply it.

        self.fit_feature_transformer(data, train_idxs, passthrough_columns)
        transformed = self.transform_features(data)

        transformed = self._post_transform(
            transformed, cast_dtypes=post_transform_cast_dtypes)

To Reproduce
Change the dataset to 'anes' in run_expt.py and run it

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about domain_label_colname in tabletshift/core/features.py #8

Confusion about domain_label_colname in tabletshift/core/features.py #8

lisa-lthorrold commented Jan 18, 2024 •

edited

Loading

Confusion about domain_label_colname in tabletshift/core/features.py #8

Confusion about domain_label_colname in tabletshift/core/features.py #8

Comments

lisa-lthorrold commented Jan 18, 2024 • edited Loading

lisa-lthorrold commented Jan 18, 2024 •

edited

Loading