opensearch-project
diff --git a/‎docs/source/examples/common/README.md‎
Lines changed: 142 additions & 0 deletions b/‎docs/source/examples/common/README.md‎
Lines changed: 142 additions & 0 deletions
diff --git a/‎docs/source/examples/semantic_highlighting/deploy.py‎ renamed to ‎docs/source/examples/common/deploy.py‎
Lines changed: 18 additions & 7 deletions b/‎docs/source/examples/semantic_highlighting/deploy.py‎ renamed to ‎docs/source/examples/common/deploy.py‎
Lines changed: 18 additions & 7 deletions
diff --git a/‎docs/source/examples/semantic_highlighting/requirements.txt‎ renamed to ‎docs/source/examples/common/requirements.txt‎ b/‎docs/source/examples/semantic_highlighting/requirements.txt‎ renamed to ‎docs/source/examples/common/requirements.txt‎
diff --git a/‎docs/source/examples/embedding_models/api_types.py‎
Lines changed: 20 additions & 0 deletions b/‎docs/source/examples/embedding_models/api_types.py‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/source/examples/embedding_models/asymmetric_e5/inference.py‎
Lines changed: 82 additions & 0 deletions b/‎docs/source/examples/embedding_models/asymmetric_e5/inference.py‎
Lines changed: 82 additions & 0 deletions
diff --git a/‎docs/source/examples/embedding_models/asymmetric_e5/requirements.txt‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/examples/embedding_models/asymmetric_e5/requirements.txt‎
Lines changed: 2 additions & 0 deletions
@@ -0,0 +1,142 @@
+# Common Model Deployment
+
+Shared deployment infrastructure for SageMaker endpoints across different model types.
+
+## Structure
+
+```
+examples/
+├── common/
+│   ├── deploy.py              # Shared deployment script
+│   └── README.md             # This file
+├── semantic_highlighting/     # Highlighting models
+│   ├── api_types.py          # Highlighting-specific types
+│   ├── modernbert/
+│   └── opensearch-semantic-highlighter/
+└── embedding_models/          # Embedding models
+    ├── api_types.py          # Embedding-specific types
+    ├── validate.sh           # Validation script
+    └── asymmetric_e5/
+```
+
+## Usage
+
+### Deploy a Model
+
+```bash
+cd common
+python3 deploy.py --model <model_name> [options]
+```
+
+The deployment script will:
+1. Download the model from HuggingFace
+2. Create a model package with inference code
+3. Deploy to SageMaker endpoint
+4. **Output the endpoint name** for validation
+
+### Validate Deployment
+
+After deployment, you'll see output like:
+```
+Endpoint deployed successfully: asymmetric-e5-20251113-210834-866f6617
+```
+
+Use this endpoint name to validate:
+
+```bash
+# For embedding models ONLY
+cd ../embedding_models
+./validate.sh asymmetric-e5-20251113-210834-866f6617
+
+# For semantic highlighting models
+# (no validation script yet - test manually via AWS console or CLI)
+```
+
+**Why specify endpoint name?**
+- SageMaker generates unique endpoint names with timestamps
+- Multiple deployments can exist simultaneously  
+- Allows testing specific endpoint versions
+- Prevents accidental validation of wrong endpoints
+
+**Note:** The `validate.sh` script is specifically designed for embedding models and tests embedding-specific payloads (query/passage embeddings, OpenSearch connector format). Semantic highlighting models require different validation payloads.
+
+### Available Models
+
+**Semantic Highlighting:**
+- `opensearch-semantic-highlighter`
+- `modernbert`
+
+**Embedding Models:**
+- `asymmetric_e5`
+
+### Options
+
+- `--model`: Model to deploy (required)
+- `--instance-type`: SageMaker instance type (default: ml.g5.xlarge)
+- `--instance-count`: Number of instances (default: 1)
+
+### Examples
+
+```bash
+# Deploy asymmetric E5 embedding model
+python3 deploy.py --model asymmetric_e5 --instance-type ml.m5.large
+
+# Deploy semantic highlighter
+python3 deploy.py --model opensearch-semantic-highlighter
+```
+
+## Environment Variables
+
+- `AWS_REGION`: AWS region (default: us-east-1)
+- `INSTANCE_TYPE`: Default instance type
+- `INSTANCE_COUNT`: Default instance count
+
+## Finding Existing Endpoints
+
+To list existing endpoints:
+```bash
+aws sagemaker list-endpoints --region us-east-1
+```
+
+## API Formats
+
+### Embedding Models (asymmetric_e5)
+
+**Request:**
+```json
+{
+    "texts": ["how much protein should a female eat"],
+    "content_type": "query"
+}
+```
+
+**Response:**
+```json
+[[0.21125227, -0.19419950, ...]]
+```
+
+### Semantic Highlighting Models
+
+**Request:**
+```json
+{
+    "question": "What is the treatment?",
+    "context": "Traditional treatments include cholinesterase inhibitors."
+}
+```
+
+**Response:**
+```json
+{
+    "highlights": [{"start": 0, "end": 50}],
+    "processing_time_ms": 22.4,
+    "device": "cuda"
+}
+```
+
+## Adding New Models
+
+1. Create model directory under appropriate task type
+2. Add `inference.py` and `requirements.txt`
+3. Update `MODEL_CONFIGS` in `deploy.py`
+4. Ensure proper `api_types.py` exists for the task type
@@ -26,12 +26,20 @@
     'opensearch-semantic-highlighter': {
         'model_name': 'opensearch-project/opensearch-semantic-highlighter-v1',
         'endpoint_prefix': 'opensearch-semantic-highlighter',
-        's3_prefix': 'opensearch-semantic-highlighter'
+        's3_prefix': 'opensearch-semantic-highlighter',
+        'task_type': 'semantic_highlighting'
     },
     'modernbert': {
         'model_name': 'answerdotai/ModernBERT-base',
         'endpoint_prefix': 'modernbert-highlighter',
-        's3_prefix': 'modernbert-highlighter'
+        's3_prefix': 'modernbert-highlighter',
+        'task_type': 'semantic_highlighting'
+    },
+    'asymmetric_e5': {
+        'model_name': 'intfloat/multilingual-e5-small',
+        'endpoint_prefix': 'asymmetric-e5',
+        's3_prefix': 'asymmetric-e5',
+        'task_type': 'embedding_models'
     }
 }
 
@@ -52,7 +60,7 @@ def create_sagemaker_role():
         sts = boto3.client('sts')
         account_id = sts.get_caller_identity()["Account"]
         role_name = 'SageMakerExecutionRole'
-        role_arn = f'arn:aws:iam::{account_id}:role/{role_name}'
+        role_arn = 'arn:aws:iam::{}:role/{}'.format(account_id, role_name)
 
         try:
             iam.get_role(RoleName=role_name)
@@ -117,12 +125,15 @@ def prepare_model_files(model_key):
 
 def create_model_tar(work_dir, model_key):
     """Create model.tar.gz with model files and inference code."""
+    config = MODEL_CONFIGS[model_key]
+    task_type = config['task_type']
+    
     os.makedirs(f"{work_dir}/code", exist_ok=True)
 
     # Copy model-specific inference code
-    inference_src = f"{model_key}/inference.py"
-    requirements_src = f"{model_key}/requirements.txt"
-    api_types_src = "api_types.py"
+    inference_src = f"../{task_type}/{model_key}/inference.py"
+    requirements_src = f"../{task_type}/{model_key}/requirements.txt"
+    api_types_src = f"../{task_type}/api_types.py"
 
     if not os.path.exists(inference_src):
         raise FileNotFoundError(f"Inference code not found: {inference_src}")
@@ -131,7 +142,7 @@ def create_model_tar(work_dir, model_key):
     if os.path.exists(requirements_src):
         shutil.copy(requirements_src, f"{work_dir}/code/requirements.txt")
 
-    # Copy shared API types
+    # Copy task-specific API types
     if os.path.exists(api_types_src):
         shutil.copy(api_types_src, f"{work_dir}/code/api_types.py")
 
 
@@ -0,0 +1,20 @@
+"""API type definitions for embedding model inference"""
+from typing import TypedDict, List, Union, Optional
+
+
+class EmbeddingRequest(TypedDict):
+    """Single embedding request"""
+    texts: List[str]
+    content_type: Optional[str]  # "query" or "passage"
+
+
+class BatchEmbeddingRequest(TypedDict):
+    """Batch embedding request (OpenSearch connector format)"""
+    parameters: EmbeddingRequest
+
+
+class EmbeddingResponse(TypedDict):
+    """Standard embedding response format"""
+    embeddings: Union[List[float], List[List[float]]]
+    processing_time_ms: float
+    device: str
@@ -0,0 +1,82 @@
+import os
+import sys
+import json
+import time
+import logging
+import torch
+from transformers import AutoTokenizer, AutoModel
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
+from api_types import EmbeddingRequest, BatchEmbeddingRequest, EmbeddingResponse
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+logger.info(f"Device: {DEVICE}")
+
+def model_fn(model_dir):
+    """Load model and tokenizer"""
+    model_name = "intfloat/multilingual-e5-small"
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    model = AutoModel.from_pretrained(model_name).to(DEVICE)
+    return {"model": model, "tokenizer": tokenizer}
+
+def input_fn(request_body, request_content_type):
+    """Parse input and return texts for embedding"""
+    if request_content_type != "application/json":
+        raise ValueError(f"Unsupported content type: {request_content_type}")
+    
+    input_data = json.loads(request_body)
+    
+    # Handle OpenSearch connector format
+    if "parameters" in input_data:
+        params = input_data["parameters"]
+        texts = params.get("texts", [])
+        content_type = params.get("content_type")
+    else:
+        texts = input_data.get("texts", [])
+        content_type = input_data.get("content_type")
+    
+    # Add content type prefix if specified
+    if content_type:
+        texts = [f"{content_type}: {text}" for text in texts]
+    
+    return texts
+
+def predict_fn(input_data, model_dict):
+    """Generate embeddings"""
+    start_time = time.time()
+    
+    model = model_dict["model"]
+    tokenizer = model_dict["tokenizer"]
+    
+    inputs = tokenizer(input_data, padding=True, truncation=True, 
+                      return_tensors="pt", max_length=512).to(DEVICE)
+    
+    with torch.no_grad():
+        outputs = model(**inputs)
+        embeddings = outputs.last_hidden_state.mean(dim=1)
+    
+    processing_time = (time.time() - start_time) * 1000
+    
+    return {
+        "embeddings": embeddings.cpu().numpy(),
+        "processing_time_ms": processing_time,
+        "device": str(DEVICE)
+    }
+
+def output_fn(prediction, content_type):
+    """Format output for OpenSearch compatibility"""
+    if content_type != "application/json":
+        raise ValueError(f"Unsupported content type: {content_type}")
+    
+    embeddings = prediction["embeddings"]
+    
+    # Return simple array format for OpenSearch
+    if len(embeddings.shape) == 2:  # Batch
+        result = [embedding.tolist() for embedding in embeddings]
+    else:  # Single
+        result = embeddings.tolist()
+    
+    return json.dumps(result)
@@ -0,0 +1,2 @@
+torch>=2.0.0
+transformers>=4.28.0
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+torch>=2.0.0`
	`2`	`+transformers>=4.28.0`