Description
Describe the feature you'd like
Add a dependencies argument to the SKLearnProcessor classes.run() method (or potentially to its parent: ScriptProcessor).
How would this feature be used? Please describe.
Currently, it is not possible to declare custom sklearn transformers (transformers based on sklearn.base.TransformerMixin) in an sklearn.pipeline.Pipeline model in the preprocessing script run on SKLearnProcessor and save it as .joblib or .pickle file to use in an inference server using an SKLearnModel (even though as far as I know, all classes are based on the same container: https://github.com/aws/sagemaker-scikit-learn-container/). Pickle will try to look for the classes in the gunicorn server module rather than the inference.py module and produce an error.
As a workaround, adding a dependencies argument to SKLearnProcessor (similar to how you would in FrameworkProcessor) would allow you to declare the custom transformers in a local module which are then installed separately and can be imported in both the preprocessing and inference scripts.
Describe alternatives you've considered
Using a FrameworkProcessor instead for preprocessing/training can work as a workaround, but this is not very intuitive.
Additional context
@Describe the feature you'd like
Add a dependencies argument to the SKLearnProcessor classes.
How would this feature be used? Please describe.
Currently, it is not possible to declare custom sklearn transformers (transformers based on sklearn.base.TransformerMixin) in an sklearn.pipeline.Pipeline model in the preprocessing script run on SKLearnProcessor and save it as .joblib or .pickle file to use in an inference server using an SKLearnModel (even though as far as I know, all classes are based on the same container: https://github.com/aws/sagemaker-scikit-learn-container/). Pickle will try to look for the classes in the gunicorn server module rather than the inference.py module and produce an error.
As a workaround, adding a dependencies argument to SKLearnProcessor (similar to the one in FrameworkProcessor) would allow you to declare the custom transformers in a local module which are then installed separately and can be imported in both the preprocessing and inference scripts.
Describe alternatives you've considered
Using a FrameworkProcessor instead for preprocessing/training can work as a workaround, but this is not very intuitive.
Additional context
More context:
#aws/sagemaker-scikit-learn-container#157