Skip to content

Commit 31d6f29

Browse files
OneZero-Yrootfs
authored andcommitted
feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) (vllm-project#453)
feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
1 parent 56bc39a commit 31d6f29

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+35921
-108
lines changed

candle-binding/Cargo.toml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,21 @@ name = "candle_semantic_router"
1010
crate-type = ["staticlib", "cdylib"]
1111

1212
[features]
13-
default = ["cuda"]
13+
default = []
1414
cuda = ["candle-core/cuda", "candle-nn/cuda", "candle-transformers/cuda"]
15+
# Flash Attention 2 support (requires CUDA and compatible GPU)
16+
# Enable with: cargo build --features flash-attn
17+
# Note: Requires CUDA Compute Capability >= 8.0 (Ampere or newer)
18+
flash-attn = ["candle-flash-attn"]
1519

1620
[dependencies]
1721
anyhow = { version = "1", features = ["backtrace"] }
1822
candle-core = "0.8.4"
1923
candle-nn = "0.8.4"
2024
candle-transformers = "0.8.4"
25+
# Flash Attention 2 (optional, requires CUDA)
26+
# Reference: https://github.com/huggingface/candle/tree/main/candle-flash-attn
27+
candle-flash-attn = { version = "0.8.4", optional = true }
2128
tokenizers = { version = "0.21.0", features = ["http"] }
2229
hf-hub = "0.4.1"
2330
safetensors = "0.4.1"

0 commit comments

Comments
 (0)