Skip to content

Conversation

@fen-qin
Copy link

@fen-qin fen-qin commented Oct 28, 2025

Description

  • add asymmetric embedding sagemaker endpoint deploy scripts
  • refactor semantic highlighting sagemaker endpoint deploy to support more general model deployment
examples/
├── common/
│   ├── deploy.py              # Shared deployment script
│   └── README.md        # Shared readme file
├── semantic_highlighting/     # Highlighting models
│   ├── api_types.py          # Highlighting-specific types
│   ├── modernbert/
│   └── opensearch-semantic-highlighter/
└── embedding_models/          # Embedding models
    ├── api_types.py          # Embedding-specific types
    └── asymmetric_e5/

Tests

  • deploy sagemaker endpointspython3 deploy.py --model asymmetric_e5 --instance-type ml.m5.large
  • detailed logs
2025-11-13 21:08:34,966 - INFO - Starting deployment for model: asymmetric_e5
2025-11-13 21:08:34,967 - INFO - Using model: intfloat/multilingual-e5-small
2025-11-13 21:08:34,967 - INFO - Downloading model from HuggingFace...
...
2025-11-13 21:11:03,791 - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2025-11-13 21:11:04,493 - INFO - Using existing role: SageMakerExecutionRole
2025-11-13 21:11:04,535 - INFO - Uploading to S3...
2025-11-13 21:11:08,657 - INFO - Model uploaded to: s3://sagemaker-us-east-1-863635919674/asymmetric-e5/20251113-210834/model.tar.gz
2025-11-13 21:13:45,595 - INFO - Creating model with name: pytorch-inference-2025-11-13-21-13-45-595
2025-11-13 21:13:46,626 - INFO - Creating endpoint-config with name asymmetric-e5-20251113-210834-866f6617
2025-11-13 21:13:47,084 - INFO - Creating endpoint with name asymmetric-e5-20251113-210834-866f6617
Deploying endpoint: asymmetric-e5-20251113-210834-866f6617
Using instance type: ml.m5.large
Instance count: 1
2025-11-13 21:20:51,112 - INFO - Deployment completed successfully!
--------------!Endpoint deployed successfully: asymmetric-e5-20251113-210834-866f6617
Endpoint URL: asymmetric-e5-20251113-210834-866f6617
 ● Completed in 736.827s
  • validate the sagemaker endpoint ./validate.sh asymmetric-e5-20251113-210834-866f6617
  • detailed logs
Testing embedding endpoint: asymmetric-e5-20251113-210834-866f6617
✓ All requests successful!
✓ Query response saved to query_response.json
✓ Passage response saved to passage_response.json
✓ Batch response saved to batch_response.json
✓ Connector response saved to connector_response.json
✓ Query response size: 8134 bytes
✓ Passage response size: 8149 bytes
✓ Batch response size: 16282 bytes
✓ Connector response size: 8172 bytes
✓ Sample query embedding (first 5 values):
   (raw response preview)

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fen Qin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant