Feature Request: Allow users to add local dependencies as arguments to SKLearnProcessor

**Describe the feature you'd like**
Add a dependencies argument to the SKLearnProcessor classes.run() method (or potentially to its parent: ScriptProcessor).

**How would this feature be used? Please describe.**
Currently, it is not possible to declare custom sklearn transformers (transformers based on sklearn.base.TransformerMixin) in an sklearn.pipeline.Pipeline model in the preprocessing script run on SKLearnProcessor and save it as .joblib or .pickle file to use in an inference server using an SKLearnModel (even though as far as I know, all classes are based on the same container: [https://github.com/aws/sagemaker-scikit-learn-container/](https://github.com/aws/sagemaker-scikit-learn-container/)). Pickle will try to look for the classes in the gunicorn server module rather than the inference.py module and produce an error.

As a workaround, adding a dependencies argument to SKLearnProcessor (similar to how you would in FrameworkProcessor) would allow you to declare the custom transformers in a local module which are then installed separately and can be imported in both the preprocessing and inference scripts.

**Describe alternatives you've considered**
Using a FrameworkProcessor instead for preprocessing/training can work as a workaround, but this is not very intuitive.

**Additional context**
@**Describe the feature you'd like**
Add a dependencies argument to the SKLearnProcessor classes.

**How would this feature be used? Please describe.**
Currently, it is not possible to declare custom sklearn transformers (transformers based on sklearn.base.TransformerMixin) in an sklearn.pipeline.Pipeline model in the preprocessing script run on SKLearnProcessor and save it as .joblib or .pickle file to use in an inference server using an SKLearnModel (even though as far as I know, all classes are based on the same container: [https://github.com/aws/sagemaker-scikit-learn-container/](https://github.com/aws/sagemaker-scikit-learn-container/)). Pickle will try to look for the classes in the gunicorn server module rather than the inference.py module and produce an error.

As a workaround, adding a dependencies argument to SKLearnProcessor (similar to the one in FrameworkProcessor) would allow you to declare the custom transformers in a local module which are then installed separately and can be imported in both the preprocessing and inference scripts.

**Describe alternatives you've considered**
Using a FrameworkProcessor instead for preprocessing/training can work as a workaround, but this is not very intuitive.

**Additional context**
More context:
#https://github.com/aws/sagemaker-scikit-learn-container/issues/157


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Allow users to add local dependencies as arguments to SKLearnProcessor #3599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Allow users to add local dependencies as arguments to SKLearnProcessor #3599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions