You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
There is some confusion about domain_label_colname in tabletshift/core/features.py. What is it's purpose, and how is it different from domain_split_varname?
Is there a reason it is not added in the self.get_passthrough_columns call? In get_passthrough_columns it seems this is an optional attribute, but it is only being called from one place.
In any case, without it being added, the columns in the datasets are transformed (one hot coded or binned), and the column names are adjusted accordingly. At the point this code is being run, domain_label_colname == domain_label_varname
If domain_label_colname is a categorical attribute (as it's the case for anes dataset) then the transformed data butchers it's column name, so by the time this code is called straight after:
if domain_label_colname:
# Case: fit the domain label transformer and apply it.
transformed.loc[:, domain_label_colname] = \
self.fit_transform_domain_labels(
transformed.loc[:, domain_label_colname])
we have exception, as the column name no longer exists (4 new columns with an extended version of that name is present). In the diabetes readmission dataset, the column which is domain_label_column is an int, so it retrains its column name when this code is called, and no exception is thrown.
Describe the bug
There is some confusion about
domain_label_colname
intabletshift/core/features.py
. What is it's purpose, and how is it different fromdomain_split_varname
?Is there a reason it is not added in the
self.get_passthrough_columns
call? Inget_passthrough_columns
it seems this is an optional attribute, but it is only being called from one place.In any case, without it being added, the columns in the datasets are transformed (one hot coded or binned), and the column names are adjusted accordingly. At the point this code is being run, domain_label_colname == domain_label_varname
If
domain_label_colname
is a categorical attribute (as it's the case for anes dataset) then the transformed data butchers it's column name, so by the time this code is called straight after:we have exception, as the column name no longer exists (4 new columns with an extended version of that name is present). In the diabetes readmission dataset, the column which is domain_label_column is an int, so it retrains its column name when this code is called, and no exception is thrown.
To Reproduce
Change the dataset to 'anes' in run_expt.py and run it
The text was updated successfully, but these errors were encountered: