You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/hpu/gaudi/README.md
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -158,7 +158,7 @@ The default deployment utilizes Gaudi devices primarily for the `vllm-service`,
158
158
159
159
### compose_tgi.yaml - TGI Deployment
160
160
161
-
The TGI (Text Generation Inference) deployment and the default deployment differ primarily in their service configurations and specific focus on handling large language models (LLMs). The TGI deployment includes a unique `tgi-service`, which utilizes the `ghcr.io/huggingface/tgi-gaudi:2.0.6` image and is specifically configured to run on Gaudi hardware. This service is designed to handle LLM tasks with optimizations such as `ENABLE_HPU_GRAPH` and `USE_FLASH_ATTENTION`. The `chatqna-gaudi-backend-server` in the TGI deployment depends on the `tgi-service`, whereas in the default deployment, it relies on the `vllm-service`.
161
+
The TGI (Text Generation Inference) deployment and the default deployment differ primarily in their service configurations and specific focus on handling large language models (LLMs). The TGI deployment includes a unique `tgi-service`, which utilizes the `ghcr.io/huggingface/tgi-gaudi:2.3.1` image and is specifically configured to run on Gaudi hardware. This service is designed to handle LLM tasks with optimizations such as `ENABLE_HPU_GRAPH` and `USE_FLASH_ATTENTION`. The `chatqna-gaudi-backend-server` in the TGI deployment depends on the `tgi-service`, whereas in the default deployment, it relies on the `vllm-service`.
| chatqna-gaudi-backend-server | opea/chatqna:latest| No |
172
172
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest| No |
173
173
| chatqna-gaudi-nginx-server | opea/nginx:latest| No |
@@ -178,7 +178,7 @@ This deployment may allocate more Gaudi resources to the tgi-service to optimize
178
178
179
179
The FAQs(frequently asked questions and answers) generation Deployment will generate FAQs instead of normally text generation. It add a new microservice called `llm-faqgen`, which is a microservice that interacts with the TGI/vLLM LLM server to generate FAQs from input text.
180
180
181
-
The TGI (Text Generation Inference) deployment and the default deployment differ primarily in their service configurations and specific focus on handling large language models (LLMs). The TGI deployment includes a unique `tgi-service`, which utilizes the `ghcr.io/huggingface/tgi-gaudi:2.0.6` image and is specifically configured to run on Gaudi hardware. This service is designed to handle LLM tasks with optimizations such as `ENABLE_HPU_GRAPH` and `USE_FLASH_ATTENTION`. The `chatqna-gaudi-backend-server` in the TGI deployment depends on the `tgi-service`, whereas in the default deployment, it relies on the `vllm-service`.
181
+
The TGI (Text Generation Inference) deployment and the default deployment differ primarily in their service configurations and specific focus on handling large language models (LLMs). The TGI deployment includes a unique `tgi-service`, which utilizes the `ghcr.io/huggingface/tgi-gaudi:2.3.1` image and is specifically configured to run on Gaudi hardware. This service is designed to handle LLM tasks with optimizations such as `ENABLE_HPU_GRAPH` and `USE_FLASH_ATTENTION`. The `chatqna-gaudi-backend-server` in the TGI deployment depends on the `tgi-service`, whereas in the default deployment, it relies on the `vllm-service`.
The _compose_guardrails.yaml_ Docker Compose file introduces enhancements over the default deployment by incorporating additional services focused on safety and ChatQnA response control. Notably, it includes the `tgi-guardrails-service` and `guardrails` services. The `tgi-guardrails-service` uses the `ghcr.io/huggingface/tgi-gaudi:2.0.6` image and is configured to run on Gaudi hardware, providing functionality to manage input constraints and ensure safe operations within defined limits. The guardrails service, using the `opea/guardrails:latest` image, acts as a safety layer that interfaces with the `tgi-guardrails-service` to enforce safety protocols and manage interactions with the large language model (LLM). This backend server now depends on the `tgi-guardrails-service` and `guardrails`, alongside existing dependencies like `redis-vector-db`, `tei-embedding-service`, `retriever`, `tei-reranking-service`, and `vllm-service`. The environment configurations for the backend are also updated to include settings for the guardrail services.
217
+
The _compose_guardrails.yaml_ Docker Compose file introduces enhancements over the default deployment by incorporating additional services focused on safety and ChatQnA response control. Notably, it includes the `tgi-guardrails-service` and `guardrails` services. The `tgi-guardrails-service` uses the `ghcr.io/huggingface/tgi-gaudi:2.3.1` image and is configured to run on Gaudi hardware, providing functionality to manage input constraints and ensure safe operations within defined limits. The guardrails service, using the `opea/guardrails:latest` image, acts as a safety layer that interfaces with the `tgi-guardrails-service` to enforce safety protocols and manage interactions with the large language model (LLM). This backend server now depends on the `tgi-guardrails-service` and `guardrails`, alongside existing dependencies like `redis-vector-db`, `tei-embedding-service`, `retriever`, `tei-reranking-service`, and `vllm-service`. The environment configurations for the backend are also updated to include settings for the guardrail services.
218
218
219
219
| Service Name | Image Name | Gaudi Specific | Uses LLM |
| tei-embedding-service | ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 | No | No |
226
226
| retriever | opea/retriever:latest| No | No |
@@ -262,8 +262,8 @@ The table provides a comprehensive overview of the ChatQnA services utilized acr
262
262
| retriever | opea/retriever:latest| No | Retrieves data from the Redis database and interacts with embedding services. |
263
263
| tei-reranking-service | ghcr.io/huggingface/tei-gaudi:1.5.0 | Yes | Reranks text embeddings, typically using Gaudi hardware for enhanced performance. |
264
264
| vllm-service | opea/vllm-gaudi:latest| No | Handles large language model (LLM) tasks, utilizing Gaudi hardware. |
265
-
| tgi-service | ghcr.io/huggingface/tgi-gaudi:2.0.6| Yes | Specific to the TGI deployment, focuses on text generation inference using Gaudi hardware. |
266
-
| tgi-guardrails-service | ghcr.io/huggingface/tgi-gaudi:2.0.6| Yes | Provides guardrails functionality, ensuring safe operations within defined limits. |
265
+
| tgi-service | ghcr.io/huggingface/tgi-gaudi:2.3.1| Yes | Specific to the TGI deployment, focuses on text generation inference using Gaudi hardware. |
266
+
| tgi-guardrails-service | ghcr.io/huggingface/tgi-gaudi:2.3.1| Yes | Provides guardrails functionality, ensuring safe operations within defined limits. |
267
267
| guardrails | opea/guardrails:latest| Yes | Acts as a safety layer, interfacing with the `tgi-guardrails-service` to enforce safety protocols. |
268
268
| chatqna-gaudi-backend-server | opea/chatqna:latest| No | Serves as the backend for the ChatQnA application, with variations depending on the deployment. |
269
269
| chatqna-gaudi-ui-server | opea/chatqna-ui:latest| No | Provides the user interface for the ChatQnA application. |
@@ -288,7 +288,7 @@ The `ghcr.io/huggingface/text-embeddings-inference:cpu-1.6` image supporting `te
288
288
289
289
### tgi-gaurdrails-service
290
290
291
-
The `tgi-guardrails-service` uses the `GUARDRAILS_MODEL_ID` parameter to select a [supported model](https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#tested-models-and-configurations) for the associated `ghcr.io/huggingface/tgi-gaudi:2.0.6` image. Like the `tei-embedding-service` and `tei-reranking-service` services, it doesn't use the `NUM_CARDS` parameter.
291
+
The `tgi-guardrails-service` uses the `GUARDRAILS_MODEL_ID` parameter to select a [supported model](https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#tested-models-and-configurations) for the associated `ghcr.io/huggingface/tgi-gaudi:2.3.1` image. Like the `tei-embedding-service` and `tei-reranking-service` services, it doesn't use the `NUM_CARDS` parameter.
0 commit comments