Skip to content

No DataProcessing in local transform job #4757

Open
@sephib

Description

@sephib

Describe the bug
When running instance_type="local" the DataProcessing is not used, and all the input data is sent to the prediction

In the _perform_batch_inference function in the entities.py file there is no use of the DataProcessing key from the kwargs. - so the input_data / item is sent as-is without any filtering .

To reproduce

from sagemaker.model import Model
from sagemaker.local import LocalSession
import boto3        

model = Model(
        model_data='file://to/my/model_data',
        role='MY_ROLE',
        image_uri='IMAGE_URI',
        sagemaker_session= LocalSession(boto3.Session(region_name='my-region'))
    )
transformer = model.transformer(
    instance_count=1,
    instance_type="local",
    strategy="MultiRecord",
    assemble_with="Line",
    output_path="file://my/output/path",
    accept="text/csv",
    max_concurrent_transforms=1,
)
transformer.transform(
    data="file://path/to/my/data/file",
    content_type="text/csv",
    split_type="Line",
    input_filter="$[4]",  # this currently seams not to be working in local mode
    join_source="Input",
    output_filter="$[0]",
)
transformer.wait()

Expected behavior
The input csv should be filtered using the input_filter value.
Also the the output

System information
A description of your system. Please provide:

  • SageMaker Python SDK version:

sagemaker==2.219.0

  • Framework name (eg. PyTorch) or algorithm (eg. KMeans):

tested with pyTorch model

  • Python version:
    3.10.14
  • Custom Docker image (Y/N):

i'm using a custom image

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions