Skip to content

Commit c930e9c

Browse files
Merge branch 'main' into dev/kpietkun/sleep_mode
2 parents 2314cce + 1f1b075 commit c930e9c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+2093
-591
lines changed

.cd/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ images and Docker Compose, with support for custom runtime parameters and
77
benchmarking.
88

99
Detailed Quick Start procedures are available in the `docs` folder:
10-
- [Basic Quick Start Guide](../docs/getting_started/quickstart/quickstart.md)
11-
- [Advanced Configuration Options](../docs/getting_started/quickstart/quickstart_configuration.md)
12-
- [Executing Inference](../docs/getting_started/quickstart/quickstart_inference.md)
10+
- [Basic Quick Start Guide](https://vllm-gaudi.readthedocs.io/en/latest/getting_started/quickstart/quickstart.html)
11+
- [Advanced Configuration Options](https://vllm-gaudi.readthedocs.io/en/latest/getting_started/quickstart/quickstart_configuration.html)
12+
- [Executing Inference](https://vllm-gaudi.readthedocs.io/en/latest/getting_started/quickstart/quickstart_inference.html)
1313

1414
If you prefer to build vLLM Hardware Plugin for Intel Gaudi from source or with a custom
15-
Dockerfile, refer to the [Installation](../docs/getting_started/installation.md) guide.
15+
Dockerfile, refer to the [Installation](https://vllm-gaudi.readthedocs.io/en/latest/getting_started/installation.html) guide.

.cd/benchmark/benchmark_defaults.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ model_text:
1414
- Qwen/Qwen2.5-32B-Instruct
1515
- Qwen/Qwen2.5-72B-Instruct
1616
- Qwen/Qwen2.5-7B-Instruct
17+
- Qwen/Qwen3-0.6B
1718
- ibm-granite/granite-8b-code-instruct-4k
1819
- ibm-granite/granite-20b-code-instruct-8k
1920
DATASET: /workspace/vllm-project/benchmarks/sonnet.txt

.cd/benchmark/benchmark_scenarios_text.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ qwen25_72b_instruct:
4141
qwen25_7b_instruct:
4242
MODEL: Qwen/Qwen2.5-7B-Instruct
4343

44+
Qwen/Qwen3-0.6B:
45+
MODEL: Qwen/Qwen3-0.6B
46+
4447
granite_8b_code_instruct_4k:
4548
MODEL: ibm-granite/granite-8b-code-instruct-4k
4649

.cd/docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,5 +43,5 @@ services:
4343
env_file:
4444
- ./benchmark/benchmark_user.env
4545
volumes:
46-
- ./logs:/root/scripts/logs
46+
- /tmp/logs:/root/scripts/logs
4747
command: ["benchmark", "--config-file", "${VLLM_BENCHMARK_CONFIG_FILE}", "--config-name", "${VLLM_BENCHMARK_CONFIG_NAME}"]

.cd/server/settings_vllm.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,4 @@ Qwen/Qwen2.5-7B-Instruct,1,4352,128,2,15231233024,2,2,14.18519115,0,10,5,128,1,3
1616
ibm-granite/granite-8b-code-instruct-4k,1,4096,128,2,21474836480,2,2,20,0,10,8,128,1,32,1,32,128,256,1,128,256,1,36,4096,8,32,2,32768,1,FALSE,FALSE,2048,FALSE,TRUE,TRUE,1,0
1717
ibm-granite/granite-20b-code-instruct-8k,1,4352,128,2,40133986304,2,2,37.37,0,10,4,128,1,32,1,32,128,256,1,128,256,1,52,6144,1,48,2,65536,1,FALSE,FALSE,2048,FALSE,TRUE,TRUE,1,0
1818
Qwen/Qwen2.5-VL-7B-Instruct,1,8448,128,2,15231233024,2,2,14.18519115,0,12,4,128,1,32,1,32,128,256,1,128,256,1,28,3584,4,28,2,32768,1,FALSE,FALSE,2048,FALSE,FALSE,FALSE,1,0
19+
Qwen/Qwen3-0.6B,1,4352,128,2,1.61E+09,2,2,1.5,0,10,5,128,1,32,1,32,128,256,1,128,256,1,28,1024,8,16,2,32768,1,FALSE,FALSE,2048,FALSE,TRUE,TRUE,1,0

.cd/templates/template_vllm_benchmark.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,6 @@ vllm bench serve \
3535
--metric-percentiles 90 \
3636
--ignore-eos \
3737
--trust-remote-code \
38-
2>&1 | tee -a logs/perftest_inp${INPUT_TOK}_out${OUTPUT_TOK}_user${CONCURRENT_REQ}.log
38+
--save-result \
39+
--result-dir logs \
40+
--result-filename summary_inp${INPUT_TOK}_out${OUTPUT_TOK}_user${CONCURRENT_REQ}.json 2>&1 | tee -a logs/summary_inp${INPUT_TOK}_out${OUTPUT_TOK}_user${CONCURRENT_REQ}.log #save results to logs on a host

.github/workflows/create-release-branch.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ jobs:
154154
--build-arg VLLM_COMMIT_ARG=${{ needs.prepare-release-branch.outputs.commit_id }} \
155155
-t hpu-plugin-v1-${{ needs.prepare-release-branch.outputs.tag_name }} \
156156
-f - . <<EOF
157-
FROM vault.habana.ai/gaudi-docker/1.22.0/ubuntu24.04/habanalabs/pytorch-installer-2.7.1:latest
157+
FROM vault.habana.ai/gaudi-docker/1.22.2/ubuntu24.04/habanalabs/pytorch-installer-2.7.1:latest
158158
159159
COPY ./ /workspace/vllm-gaudi
160160
WORKDIR /workspace

.github/workflows/hourly-ci.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ jobs:
8989
run: |
9090
echo "Attempting to build Docker image..."
9191
docker build --no-cache -t hpu-plugin-v1-test-env-hourly-ci -f - . <<EOF
92-
FROM vault.habana.ai/gaudi-docker/1.22.0/ubuntu24.04/habanalabs/pytorch-installer-2.7.1:latest
92+
FROM vault.habana.ai/gaudi-docker/1.22.2/ubuntu24.04/habanalabs/pytorch-installer-2.7.1:latest
9393
9494
COPY ./ /workspace/vllm-gaudi
9595
WORKDIR /workspace

.github/workflows/pre-merge.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,7 @@ jobs:
297297
--build-arg VLLM_COMMIT_ARG=${{ env.TEST_VLLM_COMMIT }} \
298298
-t hpu-plugin-v1-test-env-pre-merge-${{ github.event.pull_request.head.sha }} \
299299
-f - . <<EOF
300-
FROM vault.habana.ai/gaudi-docker/1.22.0/ubuntu24.04/habanalabs/pytorch-installer-2.7.1:latest
300+
FROM vault.habana.ai/gaudi-docker/1.22.2/ubuntu24.04/habanalabs/pytorch-installer-2.7.1:latest
301301
302302
ARG VLLM_COMMIT_ARG
303303

README.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ vLLM Hardware Plugin for Intel® Gaudi®
1515
---
1616
*Latest News* 🔥
1717

18-
- [2025/11] The 0.10.2 release introduces the production-ready version of the vLLM Hardware Plugin for Intel® Gaudi® v1.23.0. The plugin is an alternative to the [vLLM fork](https://github.com/HabanaAI/vllm-fork), which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin. For more information about this release, see the [Release Notes](docs/release_notes.md).
18+
- [2025/11] The 0.11.2 release introduces the production-ready version of the vLLM Hardware Plugin for Intel® Gaudi® v1.22.2. The plugin is an alternative to the [vLLM fork](https://github.com/HabanaAI/vllm-fork), which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin. For more information about this release, see the [Release Notes](docs/release_notes.md).
1919
- [2025/06] We introduced an early developer preview of the vLLM Hardware Plugin for Intel® Gaudi®, which is not yet intended for general use.
2020

2121
---
@@ -67,7 +67,4 @@ We welcome and value any contributions and collaborations.
6767

6868
<!-- --8<-- [start:contact-us] -->
6969
- For technical questions and feature requests, please use GitHub [Issues](https://github.com/vllm-project/vllm-gaudi/issues).
70-
- For discussing with fellow users, please use the [vLLM Forum](https://discuss.vllm.ai).
71-
- For coordinating contributions and development, please use [Slack](https://slack.vllm.ai).
72-
- For security disclosures, please use GitHub's [Security Advisories](https://github.com/vllm-project/vllm/security/advisories) feature.
7370
<!-- --8<-- [end:contact-us] -->

0 commit comments

Comments
 (0)