Open
Description
The following transform functions contains analysis. The analysis work should be done at first to make the transform logic concrete.
Name | Feature Column Template | Analyzer |
---|---|---|
STANDARDIZE(x) | numeric_column({var_name}, normalizer_fn=lambda x : x - {mean} / {std}) | MEAN, STDDEV |
NORMALIZE(x) | numeric_column({var_name}, normalizer_fn=lambda x : x - {min} / {max} - {min}) | MAX, MIN |
BUCKETIZE(x, bucket_num=y) | bucketized_column({var_name}, boundaries={percentiles}) | PERCENTILE |
APPLY_VOCAB(x) | categorical_column_with_vocabulary({var_name}, vocabulary_list={vocabulary_list}) | DISTINCT |
HASH(x, hash_bucket_size) | categorical_column_with_hash_bucket | COUNT(DISTINCT) |
The SQLFlow syntax for data transform and analysis is discussed in #1664