From a2775a3c9ccbd2ef18bc38242c4cfb00dc616013 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 30 Apr 2025 15:18:17 +0700 Subject: [PATCH 1/6] Refine Readme for multimodalQnA Signed-off-by: Artem Astafev --- .../amd/gpu/rocm/.README.md.kate-swp | Bin 0 -> 90 bytes .../docker_compose/amd/gpu/rocm/README.md | 456 +++++----------- .../docker_compose/amd/gpu/rocm/README_old.md | 513 ++++++++++++++++++ .../amd/gpu/rocm/set_env_vllm.sh | 2 +- 4 files changed, 657 insertions(+), 314 deletions(-) create mode 100644 MultimodalQnA/docker_compose/amd/gpu/rocm/.README.md.kate-swp create mode 100644 MultimodalQnA/docker_compose/amd/gpu/rocm/README_old.md diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/.README.md.kate-swp b/MultimodalQnA/docker_compose/amd/gpu/rocm/.README.md.kate-swp new file mode 100644 index 0000000000000000000000000000000000000000..f5bb5f48c346759db7c4383fc516b65120cae7bf GIT binary patch literal 90 zcmZQzU=Z?7EJ;-eE>A2_aLdd|RWQ;sU|?VnIeAsmKyZ)y{G|qSRqN_6R3_~2k_-+4 d%1i=c5a@6X4h2$^T*0Cf!D0*yj1bX@t^h|X6>R_j literal 0 HcmV?d00001 diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 14e66d989a..c9d5193011 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -1,139 +1,99 @@ -# Build and Deploy MultimodalQnA Application on AMD GPU (ROCm) +# Deploying MultimodalQnA on AMD ROCm GPU -This document outlines the deployment process for a MultimodalQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD server with ROCm GPUs. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `multimodal_embedding` that employs [BridgeTower](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi) model as embedding model, `multimodal_retriever`, `lvm`, and `multimodal-data-prep`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service. +This document outlines the single node deployment process for a Multimodal application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservices on Intel Xeon server and AMD GPU. The steps include pulling Docker images, container deployment via Docker Compose, and service execution using microservices `llm`. -For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options. +Note: The default LLM is `Xkev/Llama-3.2V-11B-co`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models). -After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed. +## Table of Contents -## Build Docker Images +1. [MultimodalQnA Quick Start Deployment](#multimodalqna-quick-start-deployment) +2. [MultimodalQnA Docker Compose Files](#multimodalqna-docker-compose-files) +3. [Validate Microservices](#validate-microservices) +4. [Conclusion](#conclusion) -### 1. Build Docker Image +## MultimodalQnA Quick Start Deployment -- #### Create application install directory and go to it: +This section describes how to quickly deploy and test the MultimodalQnAservice manually on an AMD ROCm GPU. The basic steps are: - ```bash - mkdir ~/multimodalqna-install && cd multimodalqna-install - ``` +1. [Access the Code](#access-the-code) +2. [Configure the Deployment Environment](#configure-the-deployment-environment) +3. [Deploy the Services Using Docker Compose](#deploy-the-services-using-docker-compose) +4. [Check the Deployment Status](#check-the-deployment-status) +5. [Validate the Pipeline](#validate-the-pipeline) +6. [Cleanup the Deployment](#cleanup-the-deployment) -- #### Clone the repository GenAIExamples (the default repository branch "main" is used here): +### Access the Code - ```bash - git clone https://github.com/opea-project/GenAIExamples.git - ``` +Clone the GenAIExample repository and access the MultimodalQnA AMD ROCm GPU platform Docker Compose files and supporting scripts: - If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value): - - ```bash - git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3 - ``` - - We remind you that when using a specific version of the code, you need to use the README from this version: - -- #### Go to build directory: - - ```bash - cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_image_build - ``` - -- Cleaning up the GenAIComps repository if it was previously cloned in this directory. - This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty: - - ```bash - echo Y | rm -R GenAIComps - ``` - -- #### Clone the repository GenAIComps (the default repository branch "main" is used here): - - ```bash - git clone https://github.com/opea-project/GenAIComps.git - ``` - - If you use a specific tag of the GenAIExamples repository, - then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value): - - ```bash - git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3 - ``` - - We remind you that when using a specific version of the code, you need to use the README from this version. - -- #### Setting the list of images for the build (from the build file.yaml) +```bash +git clone https://github.com/opea-project/GenAIExamples.git +cd GenAIExamples/MultimodalQnA +``` - If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows: +Then checkout a released version, such as v1.3: - #### vLLM-based application +```bash +git checkout v1.3 +``` - ```bash - service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper vllm-rocm" - ``` +### Configure the Deployment Environment - #### TGI-based application +To set up environment variables for deploying MultimodalQnA services, set up some parameters specific to the deployment environment and source the `set_env_*.sh` script in this directory: - ```bash - service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper" - ``` +- if used vLLM - set_env_vllm.sh +- if used TGI - set_env.sh -- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI) +Set the values of the variables: - ```bash - docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm - ``` +- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world. -- #### Build Docker Images + If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address. - ```bash - docker compose -f build.yaml build ${service_list} --no-cache - ``` + If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address. - After the build, we check the list of images with the command: + If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located. - ```bash - docker image ls - ``` + We set these values in the file set_env\*\*\*\*.sh - The list of images should include: +- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services. + The values shown in the file set_env.sh or set_env_vllm.sh they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use. - ##### vLLM-based application: +Setting variables in the operating system environment: - - opea/vllm-rocm:latest - - opea/lvm:latest - - opea/multimodalqna:latest - - opea/multimodalqna-ui:latest - - opea/dataprep:latest - - opea/embedding:latest - - opea/embedding-multimodal-bridgetower:latest - - opea/retriever:latest - - opea/whisper:latest +```bash +export HUGGINGFACEHUB_API_TOKEN="Your_HuggingFace_API_Token" +source ./set_env_*.sh # replace the script name with the appropriate one +``` - ##### TGI-based application: +Consult the section on [MultimodalQnA Service configuration](#multimodalqna-configuration) for information on how service specific configuration parameters affect deployments. - - ghcr.io/huggingface/text-generation-inference:2.4.1-rocm - - opea/lvm:latest - - opea/multimodalqna:latest - - opea/multimodalqna-ui:latest - - opea/dataprep:latest - - opea/embedding:latest - - opea/embedding-multimodal-bridgetower:latest - - opea/retriever:latest - - opea/whisper:latest +### Deploy the Services Using Docker Compose ---- +To deploy the MultimodalQnA services, execute the `docker compose up` command with the appropriate arguments. For a default deployment with TGI, execute the command below. It uses the 'compose.yaml' file. -## Deploy the MultimodalQnA Application +```bash +cd docker_compose/amd/gpu/rocm +# if used TGI +docker compose -f compose.yaml up -d -### Docker Compose Configuration for AMD GPUs +# if used vLLM + +docker compose -f compose_vllm.yaml up -d +``` To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file: - compose_vllm.yaml - for vLLM-based application +- compose_faqgen_vllm.yaml - for vLLM-based application with FaqGen - compose.yaml - for TGI-based +- compose_faqgen.yaml - for TGI-based application with FaqGen ```yaml shm_size: 1g devices: - /dev/kfd:/dev/kfd - - /dev/dri/:/dev/dri/ + - /dev/dri:/dev/dri cap_add: - SYS_PTRACE group_add: @@ -149,7 +109,7 @@ shm_size: 1g devices: - /dev/kfd:/dev/kfd - /dev/dri/card0:/dev/dri/card0 - - /dev/dri/renderD128:/dev/dri/renderD128 + - /dev/dri/render128:/dev/dri/render128 cap_add: - SYS_PTRACE group_add: @@ -161,222 +121,122 @@ security_opt: **How to Identify GPU Device IDs:** Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. -### Set deploy environment variables - -#### Setting variables in the operating system environment: - -##### Set variable HUGGINGFACEHUB_API_TOKEN: +> **Note**: developers should build docker image from source when: +> +> - Developing off the git main branch (as the container's ports in the repo may be different > from the published docker image). +> - Unable to download the docker image. +> - Use a specific version of Docker image. + +Please refer to the table below to build different microservices from source: + +| Microservice | Deployment Guide | +| --------------- | ------------------------------------------------------------------------------------------------------------------ | +| vLLM | [vLLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/vllm#build-docker) | +| TGI | [TGI project](https://github.com/huggingface/text-generation-inference.git) | +| LLM | [LLM build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/llms) | +| Redis Vector DB | [Redis](https://github.com/redis/redis.git) | +| Dataprep | [Dataprep build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/dataprep/src/README_redis.md) | +| TEI Embedding | [TEI guide](https://github.com/huggingface/text-embeddings-inference.git) | +| Retriever | [Retriever build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/retrievers/src/README_redis.md) | +| TEI Reranking | [TEI guide](https://github.com/huggingface/text-embeddings-inference.git) | +| MegaService | [MegaService guide](../../../../README.md) | +| whisper-service | [whisper build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/whisper/src) +| +| LVM | [lvm build guide](https://github.com/opea-project/GenAIComps/blob/main/comps/lvms/src/) +| +| Nginx | [Nginx guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/nginx) | + +### Check the Deployment Status + +After running docker compose, check if all the containers launched via docker compose have started: ```bash -### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. -export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token' +docker ps -a ``` -#### Set variables value in set_env\*\*\*\*.sh file: +For the default deployment with TGI, the following 9 containers should have started: -Go to Docker Compose directory: - -```bash -cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm ``` - -The example uses the Nano text editor. You can use any convenient text editor: - -#### If you use vLLM - -```bash -nano set_env_vllm.sh +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +427c4dde68e2 opea/multimodalqna-ui:latest "python multimodalqn…" 2 minutes ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp multimodalqna-gradio-ui-server +5cb2186d961a opea/multimodalqna:latest "python multimodalqn…" 2 minutes ago Up About a minute 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp multimodalqna-backend-server +70e512d483e0 opea/dataprep:latest "sh -c 'python $( [ …" 2 minutes ago Up 2 minutes (healthy) 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-multimodal-redis +86927af3de6d opea/retriever:latest "python opea_retriev…" 2 minutes ago Up 2 minutes 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis +74cc8db43457 opea/embedding:latest "sh -c 'python $( [ …" 2 minutes ago Up About a minute 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding +49a7c128752d opea/lvm:latest "python opea_lvm_mic…" 2 minutes ago Up 2 minutes 0.0.0.0:9399->9399/tcp, :::9399->9399/tcp lvm +0d66fda22534 ghcr.io/huggingface/text-generation-inference:2.4.1-rocm "/tgi-entrypoint.sh …" 2 minutes ago Up 24 seconds 0.0.0.0:8399->80/tcp, [::]:8399->80/tcp tgi-llava-rocm-server +55908da067d7 opea/embedding-multimodal-bridgetower:latest "python bridgetower_…" 2 minutes ago Up 2 minutes (healthy) 0.0.0.0:6006->6006/tcp, :::6006->6006/tcp embedding-multimodal-bridgetower +8a4283f61cf5 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 2 minutes ago Up 2 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db +e9e5f1f3b57a opea/whisper:latest "python whisper_serv…" 2 minutes ago Up 2 minutes 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp whisper-service +3923edad3acc opea/vllm-rocm:latest "python3 /workspace/…" 24 hours ago Up 24 hours (healthy) 0.0.0.0:8086->8011/tcp, [::]:8086->8011/tcp vllm-service ``` -#### If you use TGI -```bash -nano set_env.sh -``` -If you are in a proxy environment, also set the proxy-related environment variables: +if used vLLM: -```bash -export http_proxy="Your_HTTP_Proxy" -export https_proxy="Your_HTTPs_Proxy" ``` - -Set the values of the variables: - -- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world. - - If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address. - - If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address. - - If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located. - - We set these values in the file set_env\*\*\*\*.sh - -- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services. - The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use. - -#### Required Models - -By default, the multimodal-embedding and LVM models are set to a default value as listed below: - -| Service | Model | -| --------- | ------------------------------------------- | -| embedding | BridgeTower/bridgetower-large-itm-mlm-gaudi | -| LVM | llava-hf/llava-1.5-7b-hf | -| LVM | Xkev/Llama-3.2V-11B-cot | - -Note: - -For AMD ROCm System "Xkev/Llama-3.2V-11B-cot" is recommended to run on ghcr.io/huggingface/text-generation-inference:2.4.1-rocm - -#### Set variables with script set_env\*\*\*\*.sh - -#### If you use vLLM - -```bash -. set_env_vllm.sh +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +54f8a87b82be opea/multimodalqna-ui:latest "python multimodalqn…" 42 seconds ago Up 9 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp multimodalqna-gradio-ui-server +e533d862cf8e opea/multimodalqna:latest "python multimodalqn…" 42 seconds ago Up 9 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp multimodalqna-backend-server +e11e96d03e54 opea/dataprep:latest "sh -c 'python $( [ …" 42 seconds ago Up 40 seconds (healthy) 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-multimodal-redis +f5e670dae343 opea/lvm:latest "python opea_lvm_mic…" 42 seconds ago Up 40 seconds 0.0.0.0:9399->9399/tcp, :::9399->9399/tcp lvm +c81bd4792b22 opea/retriever:latest "python opea_retriev…" 42 seconds ago Up 40 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis +220c1111d5e4 opea/embedding:latest "sh -c 'python $( [ …" 42 seconds ago Up 15 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding +ece52eea577f opea/vllm-rocm:latest "python3 /workspace/…" 42 seconds ago Up 41 seconds 0.0.0.0:8081->8011/tcp, [::]:8081->8011/tcp multimodalqna-vllm-service +82bd3be58052 opea/embedding-multimodal-bridgetower:latest "python bridgetower_…" 42 seconds ago Up 41 seconds (healthy) 0.0.0.0:6006->6006/tcp, :::6006->6006/tcp embedding-multimodal-bridgetower +bac14dac272d opea/whisper:latest "python whisper_serv…" 42 seconds ago Up 41 seconds 0.0.0.0:7066->7066/tcp, :::7066->7066/tcp whisper-service +7d603688fc56 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 42 seconds ago Up 41 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db ``` -#### If you use TGI - -```bash -. set_env.sh -``` +If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section. -### Start the services: +### Validate the Pipeline -#### If you use vLLM +Once the MultimodalQnA services are running, test the pipeline using the following command: ```bash -docker compose -f compose_vllm.yaml up -d -``` - -#### If you use TGI +DATA='{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}' -```bash -docker compose -f compose.yaml up -d +curl http://${HOST_IP}:${MULTIMODALQNA_BACKEND_SERVICE_PORT}/v1/multimodalqna \ + -H "Content-Type: application/json" \ + -d "$DATA" ``` -All containers should be running and should not restart: - -##### If you use vLLM: - -- multimodalqna-vllm-service -- multimodalqna-lvm -- multimodalqna-backend-server -- multimodalqna-gradio-ui-server -- whisper-service -- embedding-multimodal-bridgetower -- redis-vector-db -- embedding -- retriever-redis -- dataprep-multimodal-redis - -##### If you use TGI: +Checking the response from the service. The response should be similar to text: -- tgi-llava-rocm-server -- multimodalqna-lvm -- multimodalqna-backend-server -- multimodalqna-gradio-ui-server -- whisper-service -- embedding-multimodal-bridgetower -- redis-vector-db -- embedding -- retriever-redis -- dataprep-multimodal-redis +```textmate +{"id":"chatcmpl-75aK2KWCfxZmVcfh5tiiHj","object":"chat.completion","created":1743568232,"model":"multimodalqna","choices":[{"index":0,"message":{"role":"assistant","content":"There is no video segments retrieved given the query!"},"finish_reason":"stop","metadata":{"audio":"you"}}],"usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0}} +``` ---- +If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. -## Validate the Services -### 1. Validate the vLLM/TGI Service +### Cleanup the Deployment -#### If you use vLLM: +To stop the containers associated with the deployment, execute the following command: ```bash -DATA='{"model": "Xkev/Llama-3.2V-11B-cot", '\ -'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}' - -curl http://${HOST_IP}:${MULTIMODALQNA_VLLM_SERVICE_PORT}/v1/chat/completions \ - -X POST \ - -d "$DATA" \ - -H 'Content-Type: application/json' -``` +# if used TGI +docker compose -f compose.yaml down -Checking the response from the service. The response should be similar to JSON: - -```json -{ - "id": "chatcmpl-a3761920c4034131b3cab073b8e8b841", - "object": "chat.completion", - "created": 1742959065, - "model": "Intel/neural-chat-7b-v3-3", - "choices": [ - { - "index": 0, - "message": { - "role": "assistant", - "content": " Deep Learning refers to a modern approach of Artificial Intelligence that aims to replicate the way human brains process information by teaching computers to learn from data without extensive programming", - "tool_calls": [] - }, - "logprobs": null, - "finish_reason": "length", - "stop_reason": null - } - ], - "usage": { "prompt_tokens": 15, "total_tokens": 47, "completion_tokens": 32, "prompt_tokens_details": null }, - "prompt_logprobs": null -} +# if used vLLM + docker compose -f compose_vllm.yaml down ``` -If the service response has a meaningful response in the value of the "choices.message.content" key, -then we consider the vLLM service to be successfully launched - -#### If you use TGI: - -```bash -DATA='{"inputs":"What is Deep Learning?",'\ -'"parameters":{"max_new_tokens":256,"do_sample": true}}' - -curl http://${HOST_IP}:${MULTIMODALQNA_TGI_SERVICE_PORT}/generate \ - -X POST \ - -d "$DATA" \ - -H 'Content-Type: application/json' -``` +## MultimodalQnA Docker Compose Files -Checking the response from the service. The response should be similar to JSON: +In the context of deploying an MultimodalQnA pipeline on an AMD ROCm platform, we can pick and choose different large language model serving frameworks, or single English TTS/multi-language TTS component. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git). -```json -{ - "generated_text": "\n\nDeep Learning is a subset of machine learning, which focuses on developing methods inspired by the functioning of the human brain; more specifically, the way it processes and acquires various types of knowledge and information. To enable deep learning, the networks are composed of multiple processing layers that form a hierarchy, with each layer learning more complex and abstraction levels of data representation.\n\nThe principle of Deep Learning is to emulate the structure of neurons in the human brain to construct artificial neural networks capable to accomplish complicated pattern recognition tasks more effectively and accurately. Therefore, these neural networks contain a series of hierarchical components, where units in earlier layers receive simple inputs and are activated by these inputs. The activation of the units in later layers are the results of multiple nonlinear transformations generated from reconstructing and integrating the information in previous layers. In other words, by combining various pieces of information at each layer, a Deep Learning network can extract the input features that best represent the structure of data, providing their outputs at the last layer or final level of abstraction.\n\nThe main idea of using these 'deep' networks in contrast to regular algorithms is that they are capable of representing hierarchical relationships that exist within the data and learn these representations by" -} -``` +| File | Description | +| ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------ | +| [compose.yaml](./compose.yaml) | The LLM serving framework is TGI. Default compose file using TGI as serving framework and redis as vector database | | +| [compose_vllm.yaml](./compose_vllm.yaml) | The LLM serving framework is vLLM. Compose file using vllm as serving framework and redis as vector database | -If the service response has a meaningful response in the value of the "generated_text" key, -then we consider the TGI service to be successfully launched -### 2. Validate the LVM Service +## Validate MicroServices -```bash -curl http://${host_ip}:${MULTIMODALQNA_LVM_PORT}/v1/lvm \ - -X POST \ - -H 'Content-Type: application/json' \ - -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' -``` - -Checking the response from the service. The response should be similar to JSON: - -```textmate -{"downstream_black_list":[],"id":"1b17e903e8c773be909bde0e7cfdb53f","text":" I will analyze the image and provide a detailed description based on its visual characteristics. I will then compare these characteristics to the standard answer provided to ensure accuracy.\n\n1. **Examine the Image**: The image is a solid color, which appears to be a shade of yellow. There are no additional elements or patterns present in the image.\n\n2. **Compare with Standard Answer**: The standard answer describes the image as a \"yellow image\" without any additional details or context. This matches the observed characteristics of the image being a single, uniform yellow color.\n\n3. **Conclusion**: Based on the visual analysis and comparison with the standard answer, the image can be accurately described as a \"yellow image.\" There are no other features or elements present that would alter this description.\n\nFINAL ANSWER: The image is a yellow image.","metadata":{"video_id":"8c7461df-b373-4a00-8696-9a2234359fe0","source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4","time_of_frame_ms":"37000000","transcript_for_inference":"yellow image"}} -``` - -If the service response has a meaningful response in the value of the "choices.text" key, -then we consider the vLLM service to be successfully launched - -### 3. Validate MicroServices - -#### embedding-multimodal-bridgetower +### 1. Embedding-multimodal-bridgetower Text example: @@ -408,7 +268,7 @@ Checking the response from the service. The response should be similar to text: {"embedding":[0.024372786283493042,-0.003916610032320023,0.07578050345182419,...,-0.046543147414922714]} ``` -#### embedding +### 2. Embedding Text example: @@ -440,7 +300,7 @@ Checking the response from the service. The response should be similar to text: {"id":"cce4eab623255c4c632fb920e277dcf7","text":"This is some sample text.","embedding":[0.02613169699907303,-0.049398183822631836,...,0.03544217720627785],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":"https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true","base64_image":"iVBORw0KGgoAAAANSUhEUgAAAoEAAAJqCAMAAABjDmrLAAAABGdBTUEAALGPC/.../BCU5wghOc4AQnOMEJTnCCE5zgBCc4wQlOcILzqvO/ARWd2ns+lvHkAAAAAElFTkSuQmCC"} ``` -#### retriever-multimodal-redis +### 3. Retriever-multimodal-redis set "your_embedding" variable: @@ -463,7 +323,7 @@ Checking the response from the service. The response should be similar to text: {"id":"80a4f3fc5f5d5cd31ab1e3912f6b6042","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} ``` -#### whisper service +### 4. Whisper service ```bash curl http://${host_ip}:7066/v1/asr \ @@ -478,36 +338,6 @@ Checking the response from the service. The response should be similar to text: {"asr_result":"you"} ``` -### 4. Validate the MegaService - -```bash -DATA='{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}' - -curl http://${HOST_IP}:${MULTIMODALQNA_BACKEND_SERVICE_PORT}/v1/multimodalqna \ - -H "Content-Type: application/json" \ - -d "$DATA" -``` - -Checking the response from the service. The response should be similar to text: - -```textmate -{"id":"chatcmpl-75aK2KWCfxZmVcfh5tiiHj","object":"chat.completion","created":1743568232,"model":"multimodalqna","choices":[{"index":0,"message":{"role":"assistant","content":"There is no video segments retrieved given the query!"},"finish_reason":"stop","metadata":{"audio":"you"}}],"usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0}} -``` - -If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. - -### 5. Stop application +## Conclusion -#### If you use vLLM - -```bash -cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm -docker compose -f compose_vllm.yaml down -``` - -#### If you use TGI - -```bash -cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm -docker compose -f compose.yaml down -``` +This guide should enable developers to deploy the default configuration or any of the other compose yaml files for different configurations. It also highlights the configurable parameters that can be set before deployment. diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README_old.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README_old.md new file mode 100644 index 0000000000..14e66d989a --- /dev/null +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README_old.md @@ -0,0 +1,513 @@ +# Build and Deploy MultimodalQnA Application on AMD GPU (ROCm) + +This document outlines the deployment process for a MultimodalQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD server with ROCm GPUs. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `multimodal_embedding` that employs [BridgeTower](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi) model as embedding model, `multimodal_retriever`, `lvm`, and `multimodal-data-prep`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service. + +For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options. + +After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed. + +## Build Docker Images + +### 1. Build Docker Image + +- #### Create application install directory and go to it: + + ```bash + mkdir ~/multimodalqna-install && cd multimodalqna-install + ``` + +- #### Clone the repository GenAIExamples (the default repository branch "main" is used here): + + ```bash + git clone https://github.com/opea-project/GenAIExamples.git + ``` + + If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value): + + ```bash + git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3 + ``` + + We remind you that when using a specific version of the code, you need to use the README from this version: + +- #### Go to build directory: + + ```bash + cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_image_build + ``` + +- Cleaning up the GenAIComps repository if it was previously cloned in this directory. + This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty: + + ```bash + echo Y | rm -R GenAIComps + ``` + +- #### Clone the repository GenAIComps (the default repository branch "main" is used here): + + ```bash + git clone https://github.com/opea-project/GenAIComps.git + ``` + + If you use a specific tag of the GenAIExamples repository, + then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value): + + ```bash + git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3 + ``` + + We remind you that when using a specific version of the code, you need to use the README from this version. + +- #### Setting the list of images for the build (from the build file.yaml) + + If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows: + + #### vLLM-based application + + ```bash + service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper vllm-rocm" + ``` + + #### TGI-based application + + ```bash + service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper" + ``` + +- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI) + + ```bash + docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm + ``` + +- #### Build Docker Images + + ```bash + docker compose -f build.yaml build ${service_list} --no-cache + ``` + + After the build, we check the list of images with the command: + + ```bash + docker image ls + ``` + + The list of images should include: + + ##### vLLM-based application: + + - opea/vllm-rocm:latest + - opea/lvm:latest + - opea/multimodalqna:latest + - opea/multimodalqna-ui:latest + - opea/dataprep:latest + - opea/embedding:latest + - opea/embedding-multimodal-bridgetower:latest + - opea/retriever:latest + - opea/whisper:latest + + ##### TGI-based application: + + - ghcr.io/huggingface/text-generation-inference:2.4.1-rocm + - opea/lvm:latest + - opea/multimodalqna:latest + - opea/multimodalqna-ui:latest + - opea/dataprep:latest + - opea/embedding:latest + - opea/embedding-multimodal-bridgetower:latest + - opea/retriever:latest + - opea/whisper:latest + +--- + +## Deploy the MultimodalQnA Application + +### Docker Compose Configuration for AMD GPUs + +To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file: + +- compose_vllm.yaml - for vLLM-based application +- compose.yaml - for TGI-based + +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/:/dev/dri/ +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined +``` + +This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example: + +```yaml +shm_size: 1g +devices: + - /dev/kfd:/dev/kfd + - /dev/dri/card0:/dev/dri/card0 + - /dev/dri/renderD128:/dev/dri/renderD128 +cap_add: + - SYS_PTRACE +group_add: + - video +security_opt: + - seccomp:unconfined +``` + +**How to Identify GPU Device IDs:** +Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. + +### Set deploy environment variables + +#### Setting variables in the operating system environment: + +##### Set variable HUGGINGFACEHUB_API_TOKEN: + +```bash +### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. +export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token' +``` + +#### Set variables value in set_env\*\*\*\*.sh file: + +Go to Docker Compose directory: + +```bash +cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm +``` + +The example uses the Nano text editor. You can use any convenient text editor: + +#### If you use vLLM + +```bash +nano set_env_vllm.sh +``` + +#### If you use TGI + +```bash +nano set_env.sh +``` + +If you are in a proxy environment, also set the proxy-related environment variables: + +```bash +export http_proxy="Your_HTTP_Proxy" +export https_proxy="Your_HTTPs_Proxy" +``` + +Set the values of the variables: + +- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world. + + If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address. + + If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address. + + If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located. + + We set these values in the file set_env\*\*\*\*.sh + +- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services. + The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use. + +#### Required Models + +By default, the multimodal-embedding and LVM models are set to a default value as listed below: + +| Service | Model | +| --------- | ------------------------------------------- | +| embedding | BridgeTower/bridgetower-large-itm-mlm-gaudi | +| LVM | llava-hf/llava-1.5-7b-hf | +| LVM | Xkev/Llama-3.2V-11B-cot | + +Note: + +For AMD ROCm System "Xkev/Llama-3.2V-11B-cot" is recommended to run on ghcr.io/huggingface/text-generation-inference:2.4.1-rocm + +#### Set variables with script set_env\*\*\*\*.sh + +#### If you use vLLM + +```bash +. set_env_vllm.sh +``` + +#### If you use TGI + +```bash +. set_env.sh +``` + +### Start the services: + +#### If you use vLLM + +```bash +docker compose -f compose_vllm.yaml up -d +``` + +#### If you use TGI + +```bash +docker compose -f compose.yaml up -d +``` + +All containers should be running and should not restart: + +##### If you use vLLM: + +- multimodalqna-vllm-service +- multimodalqna-lvm +- multimodalqna-backend-server +- multimodalqna-gradio-ui-server +- whisper-service +- embedding-multimodal-bridgetower +- redis-vector-db +- embedding +- retriever-redis +- dataprep-multimodal-redis + +##### If you use TGI: + +- tgi-llava-rocm-server +- multimodalqna-lvm +- multimodalqna-backend-server +- multimodalqna-gradio-ui-server +- whisper-service +- embedding-multimodal-bridgetower +- redis-vector-db +- embedding +- retriever-redis +- dataprep-multimodal-redis + +--- + +## Validate the Services + +### 1. Validate the vLLM/TGI Service + +#### If you use vLLM: + +```bash +DATA='{"model": "Xkev/Llama-3.2V-11B-cot", '\ +'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}' + +curl http://${HOST_IP}:${MULTIMODALQNA_VLLM_SERVICE_PORT}/v1/chat/completions \ + -X POST \ + -d "$DATA" \ + -H 'Content-Type: application/json' +``` + +Checking the response from the service. The response should be similar to JSON: + +```json +{ + "id": "chatcmpl-a3761920c4034131b3cab073b8e8b841", + "object": "chat.completion", + "created": 1742959065, + "model": "Intel/neural-chat-7b-v3-3", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": " Deep Learning refers to a modern approach of Artificial Intelligence that aims to replicate the way human brains process information by teaching computers to learn from data without extensive programming", + "tool_calls": [] + }, + "logprobs": null, + "finish_reason": "length", + "stop_reason": null + } + ], + "usage": { "prompt_tokens": 15, "total_tokens": 47, "completion_tokens": 32, "prompt_tokens_details": null }, + "prompt_logprobs": null +} +``` + +If the service response has a meaningful response in the value of the "choices.message.content" key, +then we consider the vLLM service to be successfully launched + +#### If you use TGI: + +```bash +DATA='{"inputs":"What is Deep Learning?",'\ +'"parameters":{"max_new_tokens":256,"do_sample": true}}' + +curl http://${HOST_IP}:${MULTIMODALQNA_TGI_SERVICE_PORT}/generate \ + -X POST \ + -d "$DATA" \ + -H 'Content-Type: application/json' +``` + +Checking the response from the service. The response should be similar to JSON: + +```json +{ + "generated_text": "\n\nDeep Learning is a subset of machine learning, which focuses on developing methods inspired by the functioning of the human brain; more specifically, the way it processes and acquires various types of knowledge and information. To enable deep learning, the networks are composed of multiple processing layers that form a hierarchy, with each layer learning more complex and abstraction levels of data representation.\n\nThe principle of Deep Learning is to emulate the structure of neurons in the human brain to construct artificial neural networks capable to accomplish complicated pattern recognition tasks more effectively and accurately. Therefore, these neural networks contain a series of hierarchical components, where units in earlier layers receive simple inputs and are activated by these inputs. The activation of the units in later layers are the results of multiple nonlinear transformations generated from reconstructing and integrating the information in previous layers. In other words, by combining various pieces of information at each layer, a Deep Learning network can extract the input features that best represent the structure of data, providing their outputs at the last layer or final level of abstraction.\n\nThe main idea of using these 'deep' networks in contrast to regular algorithms is that they are capable of representing hierarchical relationships that exist within the data and learn these representations by" +} +``` + +If the service response has a meaningful response in the value of the "generated_text" key, +then we consider the TGI service to be successfully launched + +### 2. Validate the LVM Service + +```bash +curl http://${host_ip}:${MULTIMODALQNA_LVM_PORT}/v1/lvm \ + -X POST \ + -H 'Content-Type: application/json' \ + -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' +``` + +Checking the response from the service. The response should be similar to JSON: + +```textmate +{"downstream_black_list":[],"id":"1b17e903e8c773be909bde0e7cfdb53f","text":" I will analyze the image and provide a detailed description based on its visual characteristics. I will then compare these characteristics to the standard answer provided to ensure accuracy.\n\n1. **Examine the Image**: The image is a solid color, which appears to be a shade of yellow. There are no additional elements or patterns present in the image.\n\n2. **Compare with Standard Answer**: The standard answer describes the image as a \"yellow image\" without any additional details or context. This matches the observed characteristics of the image being a single, uniform yellow color.\n\n3. **Conclusion**: Based on the visual analysis and comparison with the standard answer, the image can be accurately described as a \"yellow image.\" There are no other features or elements present that would alter this description.\n\nFINAL ANSWER: The image is a yellow image.","metadata":{"video_id":"8c7461df-b373-4a00-8696-9a2234359fe0","source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4","time_of_frame_ms":"37000000","transcript_for_inference":"yellow image"}} +``` + +If the service response has a meaningful response in the value of the "choices.text" key, +then we consider the vLLM service to be successfully launched + +### 3. Validate MicroServices + +#### embedding-multimodal-bridgetower + +Text example: + +```bash +curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ + -X POST \ + -H "Content-Type:application/json" \ + -d '{"text":"This is example"}' +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"embedding":[0.036936961114406586,-0.0022056063171476126,0.0891181230545044,-0.019263656809926033,-0.049174826592206955,-0.05129311606287956,-0.07172256708145142,0.04365323856472969,0.03275766223669052,0.0059910244308412075,-0.0301326...,-0.0031989417038857937,0.042092420160770416]} +``` + +Image example: + +```bash +curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ + -X POST \ + -H "Content-Type:application/json" \ + -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"embedding":[0.024372786283493042,-0.003916610032320023,0.07578050345182419,...,-0.046543147414922714]} +``` + +#### embedding + +Text example: + +```bash +curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \ + -X POST \ + -H "Content-Type: application/json" \ + -d '{"text" : "This is some sample text."}' +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"id":"4fb722012a2719e38188190e1cb37ed3","text":"This is some sample text.","embedding":[0.043303076177835464,-0.051807764917612076,...,-0.0005179636646062136,-0.0027774290647357702],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":null,"base64_image":null} +``` + +Image example: + +```bash +curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ + -X POST \ + -H "Content-Type:application/json" \ + -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"id":"cce4eab623255c4c632fb920e277dcf7","text":"This is some sample text.","embedding":[0.02613169699907303,-0.049398183822631836,...,0.03544217720627785],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":"https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true","base64_image":"iVBORw0KGgoAAAANSUhEUgAAAoEAAAJqCAMAAABjDmrLAAAABGdBTUEAALGPC/.../BCU5wghOc4AQnOMEJTnCCE5zgBCc4wQlOcILzqvO/ARWd2ns+lvHkAAAAAElFTkSuQmCC"} +``` + +#### retriever-multimodal-redis + +set "your_embedding" variable: + +```bash +export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)") +``` + +Test Redis retriever + +```bash +curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/retrieval \ + -X POST \ + -H "Content-Type: application/json" \ + -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"id":"80a4f3fc5f5d5cd31ab1e3912f6b6042","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} +``` + +#### whisper service + +```bash +curl http://${host_ip}:7066/v1/asr \ + -X POST \ + -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \ + -H 'Content-Type: application/json' +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"asr_result":"you"} +``` + +### 4. Validate the MegaService + +```bash +DATA='{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}' + +curl http://${HOST_IP}:${MULTIMODALQNA_BACKEND_SERVICE_PORT}/v1/multimodalqna \ + -H "Content-Type: application/json" \ + -d "$DATA" +``` + +Checking the response from the service. The response should be similar to text: + +```textmate +{"id":"chatcmpl-75aK2KWCfxZmVcfh5tiiHj","object":"chat.completion","created":1743568232,"model":"multimodalqna","choices":[{"index":0,"message":{"role":"assistant","content":"There is no video segments retrieved given the query!"},"finish_reason":"stop","metadata":{"audio":"you"}}],"usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0}} +``` + +If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. + +### 5. Stop application + +#### If you use vLLM + +```bash +cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm +docker compose -f compose_vllm.yaml down +``` + +#### If you use TGI + +```bash +cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm +docker compose -f compose.yaml down +``` diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh index 623d0c5272..657ce6391f 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/set_env_vllm.sh @@ -19,7 +19,7 @@ export INDEX_NAME="mm-rag-redis" export VLLM_SERVER_PORT=8081 export LVM_ENDPOINT="http://${HOST_IP}:${VLLM_SERVER_PORT}" export EMBEDDING_MODEL_ID="BridgeTower/bridgetower-large-itm-mlm-itc" -export LVM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" +export MULTIMODAL_LLM_MODEL_ID="Xkev/Llama-3.2V-11B-cot" export WHISPER_MODEL="base" export MM_EMBEDDING_SERVICE_HOST_IP=${HOST_IP} export MM_RETRIEVER_SERVICE_HOST_IP=${HOST_IP} From 58457711ff7ce92e92a30713c520dd3f9c477a21 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 30 Apr 2025 16:51:43 +0700 Subject: [PATCH 2/6] Delete .README.md.kate-swp Signed-off-by: Artem Astafev --- .../amd/gpu/rocm/.README.md.kate-swp | Bin 90 -> 0 bytes 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 MultimodalQnA/docker_compose/amd/gpu/rocm/.README.md.kate-swp diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/.README.md.kate-swp b/MultimodalQnA/docker_compose/amd/gpu/rocm/.README.md.kate-swp deleted file mode 100644 index f5bb5f48c346759db7c4383fc516b65120cae7bf..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 90 zcmZQzU=Z?7EJ;-eE>A2_aLdd|RWQ;sU|?VnIeAsmKyZ)y{G|qSRqN_6R3_~2k_-+4 d%1i=c5a@6X4h2$^T*0Cf!D0*yj1bX@t^h|X6>R_j From 5ede8b18ab86bc73f6af0f8b3939599fe57ba58d Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 30 Apr 2025 16:54:27 +0700 Subject: [PATCH 3/6] Delete README_old.md Signed-off-by: Artem Astafev --- .../docker_compose/amd/gpu/rocm/README_old.md | 513 ------------------ 1 file changed, 513 deletions(-) delete mode 100644 MultimodalQnA/docker_compose/amd/gpu/rocm/README_old.md diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README_old.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README_old.md deleted file mode 100644 index 14e66d989a..0000000000 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README_old.md +++ /dev/null @@ -1,513 +0,0 @@ -# Build and Deploy MultimodalQnA Application on AMD GPU (ROCm) - -This document outlines the deployment process for a MultimodalQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on AMD server with ROCm GPUs. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `multimodal_embedding` that employs [BridgeTower](https://huggingface.co/BridgeTower/bridgetower-large-itm-mlm-gaudi) model as embedding model, `multimodal_retriever`, `lvm`, and `multimodal-data-prep`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service. - -For detailed information about these instance types, you can refer to this [link](https://aws.amazon.com/ec2/instance-types/m7i/). Once you've chosen the appropriate instance type, proceed with configuring your instance settings, including network configurations, security groups, and storage options. - -After launching your instance, you can connect to it using SSH (for Linux instances) or Remote Desktop Protocol (RDP) (for Windows instances). From there, you'll have full access to your Xeon server, allowing you to install, configure, and manage your applications as needed. - -## Build Docker Images - -### 1. Build Docker Image - -- #### Create application install directory and go to it: - - ```bash - mkdir ~/multimodalqna-install && cd multimodalqna-install - ``` - -- #### Clone the repository GenAIExamples (the default repository branch "main" is used here): - - ```bash - git clone https://github.com/opea-project/GenAIExamples.git - ``` - - If you need to use a specific branch/tag of the GenAIExamples repository, then (v1.3 replace with its own value): - - ```bash - git clone https://github.com/opea-project/GenAIExamples.git && cd GenAIExamples && git checkout v1.3 - ``` - - We remind you that when using a specific version of the code, you need to use the README from this version: - -- #### Go to build directory: - - ```bash - cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_image_build - ``` - -- Cleaning up the GenAIComps repository if it was previously cloned in this directory. - This is necessary if the build was performed earlier and the GenAIComps folder exists and is not empty: - - ```bash - echo Y | rm -R GenAIComps - ``` - -- #### Clone the repository GenAIComps (the default repository branch "main" is used here): - - ```bash - git clone https://github.com/opea-project/GenAIComps.git - ``` - - If you use a specific tag of the GenAIExamples repository, - then you should also use the corresponding tag for GenAIComps. (v1.3 replace with its own value): - - ```bash - git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout v1.3 - ``` - - We remind you that when using a specific version of the code, you need to use the README from this version. - -- #### Setting the list of images for the build (from the build file.yaml) - - If you want to deploy a vLLM-based or TGI-based application, then the set of services is installed as follows: - - #### vLLM-based application - - ```bash - service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper vllm-rocm" - ``` - - #### TGI-based application - - ```bash - service_list="multimodalqna multimodalqna-ui embedding-multimodal-bridgetower embedding retriever lvm dataprep whisper" - ``` - -- #### Optional. Pull TGI Docker Image (Do this if you want to use TGI) - - ```bash - docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm - ``` - -- #### Build Docker Images - - ```bash - docker compose -f build.yaml build ${service_list} --no-cache - ``` - - After the build, we check the list of images with the command: - - ```bash - docker image ls - ``` - - The list of images should include: - - ##### vLLM-based application: - - - opea/vllm-rocm:latest - - opea/lvm:latest - - opea/multimodalqna:latest - - opea/multimodalqna-ui:latest - - opea/dataprep:latest - - opea/embedding:latest - - opea/embedding-multimodal-bridgetower:latest - - opea/retriever:latest - - opea/whisper:latest - - ##### TGI-based application: - - - ghcr.io/huggingface/text-generation-inference:2.4.1-rocm - - opea/lvm:latest - - opea/multimodalqna:latest - - opea/multimodalqna-ui:latest - - opea/dataprep:latest - - opea/embedding:latest - - opea/embedding-multimodal-bridgetower:latest - - opea/retriever:latest - - opea/whisper:latest - ---- - -## Deploy the MultimodalQnA Application - -### Docker Compose Configuration for AMD GPUs - -To enable GPU support for AMD GPUs, the following configuration is added to the Docker Compose file: - -- compose_vllm.yaml - for vLLM-based application -- compose.yaml - for TGI-based - -```yaml -shm_size: 1g -devices: - - /dev/kfd:/dev/kfd - - /dev/dri/:/dev/dri/ -cap_add: - - SYS_PTRACE -group_add: - - video -security_opt: - - seccomp:unconfined -``` - -This configuration forwards all available GPUs to the container. To use a specific GPU, specify its `cardN` and `renderN` device IDs. For example: - -```yaml -shm_size: 1g -devices: - - /dev/kfd:/dev/kfd - - /dev/dri/card0:/dev/dri/card0 - - /dev/dri/renderD128:/dev/dri/renderD128 -cap_add: - - SYS_PTRACE -group_add: - - video -security_opt: - - seccomp:unconfined -``` - -**How to Identify GPU Device IDs:** -Use AMD GPU driver utilities to determine the correct `cardN` and `renderN` IDs for your GPU. - -### Set deploy environment variables - -#### Setting variables in the operating system environment: - -##### Set variable HUGGINGFACEHUB_API_TOKEN: - -```bash -### Replace the string 'your_huggingfacehub_token' with your HuggingFacehub repository access token. -export HUGGINGFACEHUB_API_TOKEN='your_huggingfacehub_token' -``` - -#### Set variables value in set_env\*\*\*\*.sh file: - -Go to Docker Compose directory: - -```bash -cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm -``` - -The example uses the Nano text editor. You can use any convenient text editor: - -#### If you use vLLM - -```bash -nano set_env_vllm.sh -``` - -#### If you use TGI - -```bash -nano set_env.sh -``` - -If you are in a proxy environment, also set the proxy-related environment variables: - -```bash -export http_proxy="Your_HTTP_Proxy" -export https_proxy="Your_HTTPs_Proxy" -``` - -Set the values of the variables: - -- **HOST_IP, HOST_IP_EXTERNAL** - These variables are used to configure the name/address of the service in the operating system environment for the application services to interact with each other and with the outside world. - - If your server uses only an internal address and is not accessible from the Internet, then the values for these two variables will be the same and the value will be equal to the server's internal name/address. - - If your server uses only an external, Internet-accessible address, then the values for these two variables will be the same and the value will be equal to the server's external name/address. - - If your server is located on an internal network, has an internal address, but is accessible from the Internet via a proxy/firewall/load balancer, then the HOST_IP variable will have a value equal to the internal name/address of the server, and the EXTERNAL_HOST_IP variable will have a value equal to the external name/address of the proxy/firewall/load balancer behind which the server is located. - - We set these values in the file set_env\*\*\*\*.sh - -- **Variables with names like "**\*\*\*\*\*\*\_PORT"\*\* - These variables set the IP port numbers for establishing network connections to the application services. - The values shown in the file set_env.sh or set_env_vllm they are the values used for the development and testing of the application, as well as configured for the environment in which the development is performed. These values must be configured in accordance with the rules of network access to your environment's server, and must not overlap with the IP ports of other applications that are already in use. - -#### Required Models - -By default, the multimodal-embedding and LVM models are set to a default value as listed below: - -| Service | Model | -| --------- | ------------------------------------------- | -| embedding | BridgeTower/bridgetower-large-itm-mlm-gaudi | -| LVM | llava-hf/llava-1.5-7b-hf | -| LVM | Xkev/Llama-3.2V-11B-cot | - -Note: - -For AMD ROCm System "Xkev/Llama-3.2V-11B-cot" is recommended to run on ghcr.io/huggingface/text-generation-inference:2.4.1-rocm - -#### Set variables with script set_env\*\*\*\*.sh - -#### If you use vLLM - -```bash -. set_env_vllm.sh -``` - -#### If you use TGI - -```bash -. set_env.sh -``` - -### Start the services: - -#### If you use vLLM - -```bash -docker compose -f compose_vllm.yaml up -d -``` - -#### If you use TGI - -```bash -docker compose -f compose.yaml up -d -``` - -All containers should be running and should not restart: - -##### If you use vLLM: - -- multimodalqna-vllm-service -- multimodalqna-lvm -- multimodalqna-backend-server -- multimodalqna-gradio-ui-server -- whisper-service -- embedding-multimodal-bridgetower -- redis-vector-db -- embedding -- retriever-redis -- dataprep-multimodal-redis - -##### If you use TGI: - -- tgi-llava-rocm-server -- multimodalqna-lvm -- multimodalqna-backend-server -- multimodalqna-gradio-ui-server -- whisper-service -- embedding-multimodal-bridgetower -- redis-vector-db -- embedding -- retriever-redis -- dataprep-multimodal-redis - ---- - -## Validate the Services - -### 1. Validate the vLLM/TGI Service - -#### If you use vLLM: - -```bash -DATA='{"model": "Xkev/Llama-3.2V-11B-cot", '\ -'"messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 256}' - -curl http://${HOST_IP}:${MULTIMODALQNA_VLLM_SERVICE_PORT}/v1/chat/completions \ - -X POST \ - -d "$DATA" \ - -H 'Content-Type: application/json' -``` - -Checking the response from the service. The response should be similar to JSON: - -```json -{ - "id": "chatcmpl-a3761920c4034131b3cab073b8e8b841", - "object": "chat.completion", - "created": 1742959065, - "model": "Intel/neural-chat-7b-v3-3", - "choices": [ - { - "index": 0, - "message": { - "role": "assistant", - "content": " Deep Learning refers to a modern approach of Artificial Intelligence that aims to replicate the way human brains process information by teaching computers to learn from data without extensive programming", - "tool_calls": [] - }, - "logprobs": null, - "finish_reason": "length", - "stop_reason": null - } - ], - "usage": { "prompt_tokens": 15, "total_tokens": 47, "completion_tokens": 32, "prompt_tokens_details": null }, - "prompt_logprobs": null -} -``` - -If the service response has a meaningful response in the value of the "choices.message.content" key, -then we consider the vLLM service to be successfully launched - -#### If you use TGI: - -```bash -DATA='{"inputs":"What is Deep Learning?",'\ -'"parameters":{"max_new_tokens":256,"do_sample": true}}' - -curl http://${HOST_IP}:${MULTIMODALQNA_TGI_SERVICE_PORT}/generate \ - -X POST \ - -d "$DATA" \ - -H 'Content-Type: application/json' -``` - -Checking the response from the service. The response should be similar to JSON: - -```json -{ - "generated_text": "\n\nDeep Learning is a subset of machine learning, which focuses on developing methods inspired by the functioning of the human brain; more specifically, the way it processes and acquires various types of knowledge and information. To enable deep learning, the networks are composed of multiple processing layers that form a hierarchy, with each layer learning more complex and abstraction levels of data representation.\n\nThe principle of Deep Learning is to emulate the structure of neurons in the human brain to construct artificial neural networks capable to accomplish complicated pattern recognition tasks more effectively and accurately. Therefore, these neural networks contain a series of hierarchical components, where units in earlier layers receive simple inputs and are activated by these inputs. The activation of the units in later layers are the results of multiple nonlinear transformations generated from reconstructing and integrating the information in previous layers. In other words, by combining various pieces of information at each layer, a Deep Learning network can extract the input features that best represent the structure of data, providing their outputs at the last layer or final level of abstraction.\n\nThe main idea of using these 'deep' networks in contrast to regular algorithms is that they are capable of representing hierarchical relationships that exist within the data and learn these representations by" -} -``` - -If the service response has a meaningful response in the value of the "generated_text" key, -then we consider the TGI service to be successfully launched - -### 2. Validate the LVM Service - -```bash -curl http://${host_ip}:${MULTIMODALQNA_LVM_PORT}/v1/lvm \ - -X POST \ - -H 'Content-Type: application/json' \ - -d '{"retrieved_docs": [], "initial_query": "What is this?", "top_n": 1, "metadata": [], "chat_template":"The caption of the image is: '\''{context}'\''. {question}"}' -``` - -Checking the response from the service. The response should be similar to JSON: - -```textmate -{"downstream_black_list":[],"id":"1b17e903e8c773be909bde0e7cfdb53f","text":" I will analyze the image and provide a detailed description based on its visual characteristics. I will then compare these characteristics to the standard answer provided to ensure accuracy.\n\n1. **Examine the Image**: The image is a solid color, which appears to be a shade of yellow. There are no additional elements or patterns present in the image.\n\n2. **Compare with Standard Answer**: The standard answer describes the image as a \"yellow image\" without any additional details or context. This matches the observed characteristics of the image being a single, uniform yellow color.\n\n3. **Conclusion**: Based on the visual analysis and comparison with the standard answer, the image can be accurately described as a \"yellow image.\" There are no other features or elements present that would alter this description.\n\nFINAL ANSWER: The image is a yellow image.","metadata":{"video_id":"8c7461df-b373-4a00-8696-9a2234359fe0","source_video":"WeAreGoingOnBullrun_8c7461df-b373-4a00-8696-9a2234359fe0.mp4","time_of_frame_ms":"37000000","transcript_for_inference":"yellow image"}} -``` - -If the service response has a meaningful response in the value of the "choices.text" key, -then we consider the vLLM service to be successfully launched - -### 3. Validate MicroServices - -#### embedding-multimodal-bridgetower - -Text example: - -```bash -curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ - -X POST \ - -H "Content-Type:application/json" \ - -d '{"text":"This is example"}' -``` - -Checking the response from the service. The response should be similar to text: - -```textmate -{"embedding":[0.036936961114406586,-0.0022056063171476126,0.0891181230545044,-0.019263656809926033,-0.049174826592206955,-0.05129311606287956,-0.07172256708145142,0.04365323856472969,0.03275766223669052,0.0059910244308412075,-0.0301326...,-0.0031989417038857937,0.042092420160770416]} -``` - -Image example: - -```bash -curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ - -X POST \ - -H "Content-Type:application/json" \ - -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' -``` - -Checking the response from the service. The response should be similar to text: - -```textmate -{"embedding":[0.024372786283493042,-0.003916610032320023,0.07578050345182419,...,-0.046543147414922714]} -``` - -#### embedding - -Text example: - -```bash -curl http://${host_ip}:$MM_EMBEDDING_PORT_MICROSERVICE/v1/embeddings \ - -X POST \ - -H "Content-Type: application/json" \ - -d '{"text" : "This is some sample text."}' -``` - -Checking the response from the service. The response should be similar to text: - -```textmate -{"id":"4fb722012a2719e38188190e1cb37ed3","text":"This is some sample text.","embedding":[0.043303076177835464,-0.051807764917612076,...,-0.0005179636646062136,-0.0027774290647357702],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":null,"base64_image":null} -``` - -Image example: - -```bash -curl http://${host_ip}:${EMM_BRIDGETOWER_PORT}/v1/encode \ - -X POST \ - -H "Content-Type:application/json" \ - -d '{"text":"This is example", "img_b64_str": "iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAAFUlEQVR42mP8/5+hnoEIwDiqkL4KAcT9GO0U4BxoAAAAAElFTkSuQmCC"}' -``` - -Checking the response from the service. The response should be similar to text: - -```textmate -{"id":"cce4eab623255c4c632fb920e277dcf7","text":"This is some sample text.","embedding":[0.02613169699907303,-0.049398183822631836,...,0.03544217720627785],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2,"constraints":null,"url":"https://github.com/docarray/docarray/blob/main/tests/toydata/image-data/apple.png?raw=true","base64_image":"iVBORw0KGgoAAAANSUhEUgAAAoEAAAJqCAMAAABjDmrLAAAABGdBTUEAALGPC/.../BCU5wghOc4AQnOMEJTnCCE5zgBCc4wQlOcILzqvO/ARWd2ns+lvHkAAAAAElFTkSuQmCC"} -``` - -#### retriever-multimodal-redis - -set "your_embedding" variable: - -```bash -export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(512)]; print(embedding)") -``` - -Test Redis retriever - -```bash -curl http://${host_ip}:${REDIS_RETRIEVER_PORT}/v1/retrieval \ - -X POST \ - -H "Content-Type: application/json" \ - -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" -``` - -Checking the response from the service. The response should be similar to text: - -```textmate -{"id":"80a4f3fc5f5d5cd31ab1e3912f6b6042","retrieved_docs":[],"initial_query":"test","top_n":1,"metadata":[]} -``` - -#### whisper service - -```bash -curl http://${host_ip}:7066/v1/asr \ - -X POST \ - -d '{"audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' \ - -H 'Content-Type: application/json' -``` - -Checking the response from the service. The response should be similar to text: - -```textmate -{"asr_result":"you"} -``` - -### 4. Validate the MegaService - -```bash -DATA='{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}' - -curl http://${HOST_IP}:${MULTIMODALQNA_BACKEND_SERVICE_PORT}/v1/multimodalqna \ - -H "Content-Type: application/json" \ - -d "$DATA" -``` - -Checking the response from the service. The response should be similar to text: - -```textmate -{"id":"chatcmpl-75aK2KWCfxZmVcfh5tiiHj","object":"chat.completion","created":1743568232,"model":"multimodalqna","choices":[{"index":0,"message":{"role":"assistant","content":"There is no video segments retrieved given the query!"},"finish_reason":"stop","metadata":{"audio":"you"}}],"usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0}} -``` - -If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. - -### 5. Stop application - -#### If you use vLLM - -```bash -cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm -docker compose -f compose_vllm.yaml down -``` - -#### If you use TGI - -```bash -cd ~/multimodalqna-install/GenAIExamples/MultimodalQnA/docker_compose/amd/gpu/rocm -docker compose -f compose.yaml down -``` From b15cb2a3e591bc97e5b2c29b3f549d68a55889d4 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 30 Apr 2025 09:57:13 +0000 Subject: [PATCH 4/6] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .../docker_compose/amd/gpu/rocm/README.md | 23 ++++++++----------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index c9d5193011..b8f8e55645 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -78,7 +78,7 @@ cd docker_compose/amd/gpu/rocm docker compose -f compose.yaml up -d # if used vLLM - + docker compose -f compose_vllm.yaml up -d ``` @@ -139,12 +139,13 @@ Please refer to the table below to build different microservices from source: | TEI Embedding | [TEI guide](https://github.com/huggingface/text-embeddings-inference.git) | | Retriever | [Retriever build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/retrievers/src/README_redis.md) | | TEI Reranking | [TEI guide](https://github.com/huggingface/text-embeddings-inference.git) | -| MegaService | [MegaService guide](../../../../README.md) | -| whisper-service | [whisper build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/whisper/src) +| MegaService | [MegaService guide](../../../../README.md) | +| whisper-service | [whisper build guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/whisper/src) | + | -| LVM | [lvm build guide](https://github.com/opea-project/GenAIComps/blob/main/comps/lvms/src/) +| LVM | [lvm build guide](https://github.com/opea-project/GenAIComps/blob/main/comps/lvms/src/) | -| Nginx | [Nginx guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/nginx) | +| Nginx | [Nginx guide](https://github.com/opea-project/GenAIComps/tree/main/comps/third_parties/nginx) | ### Check the Deployment Status @@ -171,8 +172,6 @@ e9e5f1f3b57a opea/whisper:latest "pytho 3923edad3acc opea/vllm-rocm:latest "python3 /workspace/…" 24 hours ago Up 24 hours (healthy) 0.0.0.0:8086->8011/tcp, [::]:8086->8011/tcp vllm-service ``` - - if used vLLM: ``` @@ -211,7 +210,6 @@ Checking the response from the service. The response should be similar to text: If the output lines in the "choices.text" keys contain words (tokens) containing meaning, then the service is considered launched successfully. - ### Cleanup the Deployment To stop the containers associated with the deployment, execute the following command: @@ -228,11 +226,10 @@ docker compose -f compose.yaml down In the context of deploying an MultimodalQnA pipeline on an AMD ROCm platform, we can pick and choose different large language model serving frameworks, or single English TTS/multi-language TTS component. The table below outlines the various configurations that are available as part of the application. These configurations can be used as templates and can be extended to different components available in [GenAIComps](https://github.com/opea-project/GenAIComps.git). -| File | Description | -| ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------ | -| [compose.yaml](./compose.yaml) | The LLM serving framework is TGI. Default compose file using TGI as serving framework and redis as vector database | | -| [compose_vllm.yaml](./compose_vllm.yaml) | The LLM serving framework is vLLM. Compose file using vllm as serving framework and redis as vector database | - +| File | Description | +| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | --- | +| [compose.yaml](./compose.yaml) | The LLM serving framework is TGI. Default compose file using TGI as serving framework and redis as vector database | | +| [compose_vllm.yaml](./compose_vllm.yaml) | The LLM serving framework is vLLM. Compose file using vllm as serving framework and redis as vector database | ## Validate MicroServices From 0027154f7284a4d32326ed097cc84aca34c448a3 Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Wed, 30 Apr 2025 17:02:06 +0700 Subject: [PATCH 5/6] Update README.md Signed-off-by: Artem Astafev --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index c9d5193011..1688c88c55 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -189,8 +189,6 @@ bac14dac272d opea/whisper:latest "python whisper_se 7d603688fc56 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 42 seconds ago Up 41 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db ``` -If any issues are encountered during deployment, refer to the [Troubleshooting](../../../../README_miscellaneous.md#troubleshooting) section. - ### Validate the Pipeline Once the MultimodalQnA services are running, test the pipeline using the following command: From 4cf402ec2abdce8f0fd5a22394208a3545ebffcb Mon Sep 17 00:00:00 2001 From: Artem Astafev Date: Mon, 12 May 2025 13:59:41 +0700 Subject: [PATCH 6/6] Fix typo in README.md for AMD ROCm Signed-off-by: Artem Astafev --- MultimodalQnA/docker_compose/amd/gpu/rocm/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md index 2ac68657ec..2986e88745 100644 --- a/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md +++ b/MultimodalQnA/docker_compose/amd/gpu/rocm/README.md @@ -13,7 +13,7 @@ Note: The default LLM is `Xkev/Llama-3.2V-11B-co`. Before deploying the applicat ## MultimodalQnA Quick Start Deployment -This section describes how to quickly deploy and test the MultimodalQnAservice manually on an AMD ROCm GPU. The basic steps are: +This section describes how to quickly deploy and test the MultimodalQnA Service manually on an AMD ROCm GPU. The basic steps are: 1. [Access the Code](#access-the-code) 2. [Configure the Deployment Environment](#configure-the-deployment-environment)