Skip to content

Feature Request: Allow users to add local dependencies as arguments to SKLearnProcessor #3599

Open
@WouterJTB

Description

@WouterJTB

Describe the feature you'd like
Add a dependencies argument to the SKLearnProcessor classes.run() method (or potentially to its parent: ScriptProcessor).

How would this feature be used? Please describe.
Currently, it is not possible to declare custom sklearn transformers (transformers based on sklearn.base.TransformerMixin) in an sklearn.pipeline.Pipeline model in the preprocessing script run on SKLearnProcessor and save it as .joblib or .pickle file to use in an inference server using an SKLearnModel (even though as far as I know, all classes are based on the same container: https://github.com/aws/sagemaker-scikit-learn-container/). Pickle will try to look for the classes in the gunicorn server module rather than the inference.py module and produce an error.

As a workaround, adding a dependencies argument to SKLearnProcessor (similar to how you would in FrameworkProcessor) would allow you to declare the custom transformers in a local module which are then installed separately and can be imported in both the preprocessing and inference scripts.

Describe alternatives you've considered
Using a FrameworkProcessor instead for preprocessing/training can work as a workaround, but this is not very intuitive.

Additional context
@Describe the feature you'd like
Add a dependencies argument to the SKLearnProcessor classes.

How would this feature be used? Please describe.
Currently, it is not possible to declare custom sklearn transformers (transformers based on sklearn.base.TransformerMixin) in an sklearn.pipeline.Pipeline model in the preprocessing script run on SKLearnProcessor and save it as .joblib or .pickle file to use in an inference server using an SKLearnModel (even though as far as I know, all classes are based on the same container: https://github.com/aws/sagemaker-scikit-learn-container/). Pickle will try to look for the classes in the gunicorn server module rather than the inference.py module and produce an error.

As a workaround, adding a dependencies argument to SKLearnProcessor (similar to the one in FrameworkProcessor) would allow you to declare the custom transformers in a local module which are then installed separately and can be imported in both the preprocessing and inference scripts.

Describe alternatives you've considered
Using a FrameworkProcessor instead for preprocessing/training can work as a workaround, but this is not very intuitive.

Additional context
More context:
#aws/sagemaker-scikit-learn-container#157

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions