-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
ADOIssue is documented on MSFT ADO for internal trackingIssue is documented on MSFT ADO for internal trackingPipelinesproduct-question
Description
I am trying to replace the dataset in the Sample > https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automlstep-in-pipelines. But it shows that
All columns were automatically detected to be dropped by AutoML as no useful information could be inferred from the input data. The detected column purposes are the following,
Column Column1 identified as Hashes.
Column Time identified as Ignore.
Column V1 identified as Hashes.
Column V2 identified as Hashes.
Column V3 identified as Hashes.
Column V4 identified as Hashes.
Column V5 identified as Hashes.
Column V6 identified as Hashes.
Column V7 identified as Hashes.
Column V8 identified as Hashes.
Column V9 identified as Hashes.
Column V10 identified as Hashes.
Column V11 identified as Hashes.
Column V12 identified as Hashes.
Column V13 identified as Hashes.
Column V14 identified as Hashes.
Column V15 identified as Hashes.
Column V16 identified as Hashes.
Column V17 identified as Hashes.
Column V18 identified as Hashes.
Column V19 identified as Hashes.
Column V20 identified as Hashes.
Column V21 identified as Hashes.
Column V22 identified as Hashes.
Column V23 identified as Hashes.
Column V24 identified as Hashes.
Column V25 identified as Hashes.
Column V26 identified as Hashes.
Column V27 identified as Hashes.
Column V28 identified as Hashes.
Column Amount identified as Ignore.
Please either inspect your input data or use featurization config to give hints about the desired data transformation.
I try to add featurization config in the AutoML, but it still not works. I look at the 70_driver_log.txt file for the featurization run in the "Outputs + Logs" section for the AutoMLStep node in the UI, I find something like the following:
2021-05-29 17:32:03.515 - INFO - Start updating column purposes using customized feature type settings.
2021-05-29 17:32:03.516 - WARNING - Could not update column number 2 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.634 - WARNING - Could not update column number 3 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.634 - WARNING - Could not update column number 4 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.635 - WARNING - Could not update column number 5 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.635 - WARNING - Could not update column number 6 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.635 - WARNING - Could not update column number 7 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.636 - WARNING - Could not update column number 8 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.636 - WARNING - Could not update column number 9 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.636 - WARNING - Could not update column number 10 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.637 - WARNING - Could not update column number 11 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.637 - WARNING - Could not update column number 12 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.637 - WARNING - Could not update column number 13 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.638 - WARNING - Could not update column number 14 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.638 - WARNING - Could not update column number 15 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.638 - WARNING - Could not update column number 16 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.638 - WARNING - Could not update column number 17 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.639 - WARNING - Could not update column number 18 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.639 - WARNING - Could not update column number 19 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.639 - WARNING - Could not update column number 20 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.640 - WARNING - Could not update column number 21 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.640 - WARNING - Could not update column number 22 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.640 - WARNING - Could not update column number 23 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.640 - WARNING - Could not update column number 24 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.641 - WARNING - Could not update column number 25 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.641 - WARNING - Could not update column number 26 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.641 - WARNING - Could not update column number 27 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.642 - WARNING - Could not update column number 28 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.642 - WARNING - Could not update column number 29 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.643 - WARNING - Could not update column number 30 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Ignore. Please check your column before overriding feature type.
2021-05-29 17:32:03.643 - INFO - End updating column purposes using customized feature type settings.
Here is how I defined my training step.
from azureml.train.automl import AutoMLConfig
from azureml.pipeline.steps import AutoMLStep
from azureml.automl.core.featurization import FeaturizationConfig
featurization_config = FeaturizationConfig()
featurization_config.add_column_purpose('V1', 'Numeric')
featurization_config.add_column_purpose('V2', 'Numeric')
featurization_config.add_column_purpose('V3', 'Numeric')
featurization_config.add_column_purpose('V4', 'Numeric')
featurization_config.add_column_purpose('V5', 'Numeric')
featurization_config.add_column_purpose('V6', 'Numeric')
featurization_config.add_column_purpose('V7', 'Numeric')
featurization_config.add_column_purpose('V8', 'Numeric')
featurization_config.add_column_purpose('V9', 'Numeric')
featurization_config.add_column_purpose('V10', 'Numeric')
featurization_config.add_column_purpose('V11', 'Numeric')
featurization_config.add_column_purpose('V12', 'Numeric')
featurization_config.add_column_purpose('V13', 'Numeric')
featurization_config.add_column_purpose('V14', 'Numeric')
featurization_config.add_column_purpose('V15', 'Numeric')
featurization_config.add_column_purpose('V16', 'Numeric')
featurization_config.add_column_purpose('V17', 'Numeric')
featurization_config.add_column_purpose('V18', 'Numeric')
featurization_config.add_column_purpose('V19', 'Numeric')
featurization_config.add_column_purpose('V20', 'Numeric')
featurization_config.add_column_purpose('V21', 'Numeric')
featurization_config.add_column_purpose('V22', 'Numeric')
featurization_config.add_column_purpose('V23', 'Numeric')
featurization_config.add_column_purpose('V24', 'Numeric')
featurization_config.add_column_purpose('V25', 'Numeric')
featurization_config.add_column_purpose('V26', 'Numeric')
featurization_config.add_column_purpose('V27', 'Numeric')
featurization_config.add_column_purpose('V28', 'Numeric')
featurization_config.add_column_purpose('Amount', 'Numeric')
featurization_config.add_column_purpose('Class', 'CategoricalHash')
# Change iterations to a reasonable number (50) to get better accuracy
automl_settings = {
"iteration_timeout_minutes" : 10,
"iterations" : 2,
"experiment_timeout_hours" : 0.25,
# "featurization": featurization_config,
"primary_metric" : 'AUC_weighted'
}
automl_config = AutoMLConfig(task = 'classification',
debug_log = 'automated_ml_errors.log',
compute_target = compute_target,
run_configuration = aml_run_config,
featurization = featurization_config,
training_data = prepped_data,
# label_column_name = 'Survived',
label_column_name = 'Class',
**automl_settings)
train_step = AutoMLStep(name='AutoML_Classification',
automl_config=automl_config,
passthru_automl_config=False,
outputs=[metrics_data,model_data],
enable_default_model_output=False,
enable_default_metrics_output=False,
allow_reuse=True)
The new dataset is > https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv
Metadata
Metadata
Assignees
Labels
ADOIssue is documented on MSFT ADO for internal trackingIssue is documented on MSFT ADO for internal trackingPipelinesproduct-question