Skip to content

Support different input and output types for Model Monitor data quality monitoring jobs #2785

Open
@caitriggs

Description

@caitriggs

Discussed in #2393

Currently, when capturing data for a scheduled Model Monitor job, the data inputs and outputs must be encoded using the same content type. Otherwise, the following error occurs:

Error: Encoding mismatch: Encoding is CSV for endpointInput, but Encoding is JSON for endpointOutput. We currently only support the same type of input and output encoding at the moment.

An example for what captureData is giving back for a ModelMonitor.list_executions() call where the input is CSV and the output is set to JSON:
{"captureData":{"endpointInput":{"observedContentType":"text/csv","mode":"INPUT","data":"0.5629733055182888,0.3018707225866159,0.5824503894753207","encoding":"CSV"},"endpointOutput":{"observedContentType":"application/json","mode":"OUTPUT","data":"{\"predictions\": [{\"score\": 0.012620825320482254, \"predicted_label\": 0}]}","encoding":"JSON"}},"eventMetadata":{"eventId":"28cc8646-bb47-4a96-92fc-d04fc2651286","inferenceTime":"2021-11-30T04:32:05Z"},"eventVersion":"0"}

Please support different input and output content types for Model Monitor data quality monitoring jobs.

Because there's also no apparent way to set both input and output of the endpoint to the same encoding in the SageMaker examples or documentation.

Setting a serializer and deserializer to the same content type during deployment of the endpoint does not appear to work. The endpoint continues to only set endpointOutput to JSON.

# data capture config object
data_capture_config = sagemaker.model_monitor.DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100, 
    capture_options=["REQUEST", "RESPONSE"],
    csv_content_types=["text/csv"],
    destination_s3_uri=s3_capture_upload_path,
    sagemaker_session=sm_sess
)

model.deploy(
             initial_instance_count=endpoint_instance_count,
             instance_type=endpoint_instance_type,
             model_name=model_name,
             endpoint_name=endpoint_name,
             data_capture_config=data_capture_config,
             serializer=sagemaker.serializers.CSVSerializer(),
             deserializer=sagemaker.deserializers.CSVDeserializer(accept="application/json"),
             tags=[{'Key':'demo-configs', 'Value':prefix}]
)

This results in any scheduled data quality monitoring job fail with that same "Encoding mismatch" error.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions