Skip to content

Mutual exclusivity of SIGNED vs UNSIGNED data #894

@mahynski

Description

@mahynski

Describe the bug

  1. I am trying to extend the feature_preprocessing module with various things and I am confused about the SIGNED vs. UNSIGED designations. The documentation indicates these are mutually exclusive, yet in the code there are multiple instances of both being specified including the LDA example provided. The PCA preprocessor is registered with UNSIGNED data for both input and output, which confuses me since the "data" (X) should be able to include both positive and negative values.

  2. For an algorithm to be considered deterministic it must produce identical results (with the same random number seed / state). The PCA for example sets random_state=None and so is not determinstic, as indicated in autosklearn.pipeline.components.feature_preprocessing.pca.PCA. However, if you just provided a fixed value (random_state=0) would it be correct to set deterministic=True in the properties?

To Reproduce

Steps to reproduce the behavior:

  1. Inspect autosklearn.pipeline.components.feature_preprocessing.pca.PCA
  2. Look at get_properties() method

Expected behavior

I would have expected:
'input': (DENSE, SIGNED_DATA),
output': (DENSE, SIGNED_DATA)
whereas in the code these are specified as 'UNSIGNED_DATA'

Environment and installation:

Ubuntu 18.04 LTS, Anaconda python 3.7 environment, auto-sklearn version 0.7.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions