AzureOpenAI-compatible RESTful APIs for Amazon Bedrock.
This is a fork project from Bedrock Access Gateway, with the dedicated support for AzureOpenAI client instead of OpenAI client.
Enhanced Features:
- AzureOpenAI client
- Deployment on Amazon EKS
- JSON mode response format
- AWS Distro for OpenTelemetry support, helps on debugging and observability.
Assuming you already have an EKS cluster running, otherwise you can run make cluster
to create a brand new.
First set up the permission for the Bedrock Proxy API. To simplify the process, we are leveraging the EKS Pod Identity feature, and using the EKS CLI tool - eksctl.
eksctl create addon --cluster <cluster-name> --name eks-pod-identity-agent
eksctl create podidentityassociation \
--cluster <cluster-name> \
--namespace bedrock-proxy-api \
--service-account-name bedrock-proxy-api \
--permission-policy-arns="arn:aws:iam::aws:policy/AmazonBedrockFullAccess"
and then deploy the Bedrock Proxy API
make deploy
We highly recommend you build image yourself and store the image in your ECR. Simply run below command will help you build the image and push the image to your ECR repo bedrock-proxy-api
in us-west-2
region by default. The ECR repo will be created if it doesn't exist.
make build
Make sure to update the image of the k8s deployment under deployment/k8s/manifest.yaml
, and run make deploy
to deploy again.
Tip
Feel free to change the region within the scripts/push-to-ecr.sh
if you want to change the target ECR repo name and region.
Once you have the bedrock-proxy-api
deployed up and running, you can use k8s port-forward
to forward service locally,
kubectl port-forward -n bedrock-proxy-api svc/bedrock-proxy-api 8000:80
then you can use a simple curl command to call the service.
export BASE_URL=http://localhost:8000
curl "$BASE_URL/openai/deployments/gpt-4o/chat/completions?api-version=2024-06-01" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer bedrock" \
-d '{
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
you should see the output similar as the follwoing
{"id":"chatcmpl-51c271ff","created":1723135203,"model":"anthropic.claude-3-sonnet-20240229-v1:0","system_fingerprint":"fp","choices":[{"index":0,"finish_reason":"stop","logprobs":null,"message":{"role":"assistant","content":"Hello! How can I assist you today?"}}],"object":"chat.completion","usage":{"prompt_tokens":9,"completion_tokens":12,"total_tokens":21}}
Amazon Bedrock offers a wide range of foundation models (such as Claude 3 Opus/Sonnet/Haiku, Llama 2/3, Mistral/Mixtral, etc.) and a broad set of capabilities for you to build generative AI applications. Check the Amazon Bedrock landing page for additional information.
Sometimes, you might have applications developed using OpenAI APIs or SDKs, and you want to experiment with Amazon Bedrock without modifying your codebase. Or you may simply wish to evaluate the capabilities of these foundation models in tools like AutoGen etc. Well, this repository allows you to access Amazon Bedrock models seamlessly through OpenAI APIs and SDKs, enabling you to test these models without code changes.
If you find this GitHub repository useful, please consider giving it a free star β to show your appreciation and support for the project.
Features:
- AzureOpenAI client
- Deployment on Amazon EKS
- Support streaming response via server-sent events (SSE)
- Support Model APIs
- Support Chat Completion APIs
- Support Tool Call (new)
- Support Embedding API (new)
- Support Multimodal API (new)
Please check Usage Guide for more details about how to use the new APIs.
Note: The legacy text completion API is not supported, you should change to use chat completion API.
Supported Amazon Bedrock models family:
- Anthropic Claude 2 / 3 (Haiku/Sonnet/Opus)
- Meta Llama 2 / 3
- Mistral / Mixtral
- Cohere Command R / R+
- Cohere Embedding
You can call the models
API to get the full list of model IDs supported.
Note: The default model is set to
anthropic.claude-3-sonnet-20240229-v1:0
which can be changed via Lambda environment variables (DEFAULT_MODEL
).
Please make sure you have met below prerequisites:
- Access to Amazon Bedrock foundation models.
For more information on how to request model access, please refer to the Amazon Bedrock User Guide (Set Up > Model access)
The following diagram illustrates the reference architecture. Note that it also includes a new VPC with two public subnets only for the Application Load Balancer (ALB).
You can also choose to use AWS Fargate behind the ALB instead of AWS Lambda, the main difference is the latency of the first byte for streaming response (Fargate is lower).
Alternatively, you can use Lambda Function URL to replace ALB, see example
Please follow the steps below to deploy the Bedrock Proxy APIs into your AWS account. Only supports regions where Amazon Bedrock is available (such as us-west-2
). The deployment will take approximately 3-5 minutes π.
Step 1: Create your own custom API key (Optional)
Note: This step is to use any string (without spaces) you like to create a custom API Key (credential) that will be used to access the proxy API later. This key does not have to match your actual OpenAI key, and you don't need to have an OpenAI API key. It is recommended that you take this step and ensure that you keep the key safe and private.
- Open the AWS Management Console and navigate to the Systems Manager service.
- In the left-hand navigation pane, click on "Parameter Store".
- Click on the "Create parameter" button.
- In the "Create parameter" window, select the following options:
- Name: Enter a descriptive name for your parameter (e.g., "BedrockProxyAPIKey").
- Description: Optionally, provide a description for the parameter.
- Tier: Select Standard.
- Type: Select SecureString.
- Value: Any string (without spaces).
- Click "Create parameter".
- Make a note of the parameter name you used (e.g., "BedrockProxyAPIKey"). You'll need this in the next step.
Step 2: Deploy the CloudFormation stack
- Sign in to AWS Management Console, switch to the region to deploy the CloudFormation Stack to.
- Click the following button to launch the CloudFormation Stack in that region. Choose one of the following:
- Click "Next".
- On the "Specify stack details" page, provide the following information:
- Stack name: Change the stack name if needed.
- ApiKeyParam (if you set up an API key in Step 1): Enter the parameter name you used for storing the API key (e.g.,
BedrockProxyAPIKey
). If you did not set up an API key, leave this field blank. Click "Next".
- On the "Configure stack options" page, you can leave the default settings or customize them according to your needs.
- Click "Next".
- On the "Review" page, review the details of the stack you're about to create. Check the "I acknowledge that AWS CloudFormation might create IAM resources" checkbox at the bottom.
- Click "Create stack".
That is it! π Once deployed, click the CloudFormation stack and go to Outputs tab, you can find the API Base URL from APIBaseUrl
, the value should look like http://xxxx.xxx.elb.amazonaws.com/api/v1
.
All you need is the API Key and the API Base URL. If you didn't set up your own key, then the default API Key (bedrock
) will be used.
Now, you can try out the proxy APIs. Let's say you want to test Claude 3 Sonnet model (model ID: anthropic.claude-3-sonnet-20240229-v1:0
)...
Example API Usage
export OPENAI_API_KEY=<API key>
export OPENAI_BASE_URL=<API base url>
# For older versions
# https://github.com/openai/openai-python/issues/624
export OPENAI_API_BASE=<API base url>
curl $OPENAI_BASE_URL/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "anthropic.claude-3-sonnet-20240229-v1:0",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
Example SDK Usage
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)
Please check Usage Guide for more details about how to use embedding API, multimodal API and tool call.
Below is an image of setting up the model in AutoGen studio.
Make sure you use ChatOpenAI(...)
instead of OpenAI(...)
# pip install langchain-openai
import os
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
chat = ChatOpenAI(
model="anthropic.claude-3-sonnet-20240229-v1:0",
temperature=0,
openai_api_key=os.environ['OPENAI_API_KEY'],
openai_api_base=os.environ['OPENAI_BASE_URL'],
)
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
llm_chain = LLMChain(prompt=prompt, llm=chat)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
response = llm_chain.invoke(question)
print(response)
This application does not collect any of your data. Furthermore, it does not log any requests or responses by default.
Short answer is that API Gateway does not support server-sent events (SSE) for streaming response.
This solution only supports the regions where Amazon Bedrock is available, as for now, below are the list.
- US East (N. Virginia): us-east-1
- US West (Oregon): us-west-2
- Asia Pacific (Singapore): ap-southeast-1
- Asia Pacific (Sydney): ap-southeast-2
- Asia Pacific (Tokyo): ap-northeast-1
- Europe (Frankfurt): eu-central-1
- Europe (Paris): eu-west-3
Generally speaking, all regions that Amazon Bedrock supports will also be supported, if not, please raise an issue in Github.
Note that not all models are available in those regions.
Yes, you can clone the repo and build the container image by yourself (src/Dockerfile
) and then push to your ECR repo. You can use scripts/push-to-ecr.sh
Replace the repo url in the CloudFormation template before you deploy.
Yes, you can run this locally.
The API base url should look like http://localhost:8000/api/v1
.
Comparing with the AWS SDK call, the referenced architecture will bring additional latency on response, you can try and test that on you own.
Also, you can use Lambda Web Adapter + Function URL (see example) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
Currently, there is no plan to support SageMaker models. This may change provided there's a demand from customers.
Fine-tuned models and models with Provisioned Throughput are currently not supported. You can clone the repo and make the customization if needed.
To use the latest features, you don't need to redeploy the CloudFormation stack. You simply need to pull the latest image.
To do so, depends on which version you deployed:
- Lambda version: Go to AWS Lambda console, find the Lambda function, then find and click the
Deploy new image
button and click save. - Fargate version: Go to ECS console, click the ECS cluster, go the
Tasks
tab, select the only task that is running and simply clickStop selected
menu. A new task with latest image will start automatically.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.