-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Describe the bug
-
I am trying to extend the feature_preprocessing module with various things and I am confused about the SIGNED vs. UNSIGED designations. The documentation indicates these are mutually exclusive, yet in the code there are multiple instances of both being specified including the LDA example provided. The PCA preprocessor is registered with UNSIGNED data for both input and output, which confuses me since the "data" (X) should be able to include both positive and negative values.
-
For an algorithm to be considered deterministic it must produce identical results (with the same random number seed / state). The PCA for example sets random_state=None and so is not determinstic, as indicated in autosklearn.pipeline.components.feature_preprocessing.pca.PCA. However, if you just provided a fixed value (random_state=0) would it be correct to set deterministic=True in the properties?
To Reproduce
Steps to reproduce the behavior:
- Inspect autosklearn.pipeline.components.feature_preprocessing.pca.PCA
- Look at get_properties() method
Expected behavior
I would have expected:
'input': (DENSE, SIGNED_DATA),
output': (DENSE, SIGNED_DATA)
whereas in the code these are specified as 'UNSIGNED_DATA'
Environment and installation:
Ubuntu 18.04 LTS, Anaconda python 3.7 environment, auto-sklearn version 0.7.1