fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

OneZero-Y · 2025-10-17T09:21:57Z

What type of PR is this?

What's Changed

1. Simplified `parallel_engine.rs` Thread Handling

File: candle-binding/src/classifiers/lora/parallel_engine.rs

Before (Complex manual threading):

let intent_results = Arc::new(Mutex::new(Vec::new()));
let pii_results = Arc::new(Mutex::new(Vec::new()));
let security_results = Arc::new(Mutex::new(Vec::new()));

let handles = vec![
    self.spawn_intent_task(texts_owned.clone(), Arc::clone(&intent_results)),
    self.spawn_pii_task(texts_owned.clone(), Arc::clone(&pii_results)),
    self.spawn_security_task(texts_owned.clone(), Arc::clone(&security_results)),
];

for handle in handles {
    handle.join().unwrap();
}

After (Clean rayon parallelism):

use rayon::prelude::*;

let ((intent_results, pii_results), security_results) = rayon::join(
    || rayon::join(
        || self.intent_classifier.batch_classify(texts),
        || self.pii_classifier.batch_detect(texts),
    ),
    || self.security_classifier.batch_detect(texts),
);

2. Parallelized Batch Processing in LoRA Classifiers

Files:

candle-binding/src/classifiers/lora/pii_lora.rs
candle-binding/src/classifiers/lora/intent_lora.rs
candle-binding/src/classifiers/lora/security_lora.rs

Change (1 line per file):

// Before
texts.iter().map(|text| self.detect(text)).collect()

// After  
texts.par_iter().map(|text| self.detect(text)).collect()

3. Multi-Task Batch Classification Parallelization

File: candle-binding/src/model_architectures/lora/bert_lora.rs

Function: classify_batch_multi_task()

Before:

pub fn classify_batch_multi_task(&self, texts: &[&str]) -> Result<Vec<LoRAMultiTaskResult>> {
    // For now, process sequentially. In future, implement true batch processing
    texts.iter().map(|text| self.classify_multi_task(text)).collect()
}

After:

pub fn classify_batch_multi_task(&self, texts: &[&str]) -> Result<Vec<LoRAMultiTaskResult>> {
    texts.par_iter().map(|text| self.classify_multi_task(text)).collect()
}

4. Traditional Model Batch Forward Pass Parallelization

File: candle-binding/src/model_architectures/traditional/base_model.rs

Function: forward_batch()

Before:

pub fn forward_batch(&self, input_batch: &[Tensor], attention_batch: &[Tensor]) -> Result<Vec<Tensor>> {
    let mut results = Vec::with_capacity(input_batch.len());
    for (input_ids, attention_mask) in input_batch.iter().zip(attention_batch.iter()) {
        let output = self.forward(input_ids, attention_mask)?;
        results.push(output);
    }
    Ok(results)
}

After:

pub fn forward_batch(&self, input_batch: &[Tensor], attention_batch: &[Tensor]) -> Result<Vec<Tensor>> {
    input_batch
        .par_iter()
        .zip(attention_batch.par_iter())
        .map(|(input_ids, attention_mask)| self.forward(input_ids, attention_mask))
        .collect()
}

Testing

New Test

Added: candle-binding/src/classifiers/lora/parallel_engine_test.rs

Which issue(s) this PR fixes:

part of #266

Release Notes: Yes/No

Signed-off-by: OneZero-Y <[email protected]>

github-actions · 2025-10-17T09:22:08Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/src/classifiers/lora/parallel_engine_test.rs
candle-binding/Cargo.toml
candle-binding/src/classifiers/lora/intent_lora.rs
candle-binding/src/classifiers/lora/mod.rs
candle-binding/src/classifiers/lora/parallel_engine.rs
candle-binding/src/classifiers/lora/pii_lora.rs
candle-binding/src/classifiers/lora/security_lora.rs
candle-binding/src/model_architectures/embedding/qwen3_embedding_test.rs
candle-binding/src/model_architectures/lora/bert_lora.rs
candle-binding/src/model_architectures/traditional/base_model.rs

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-10-17T12:16:19Z

@OneZero-Y thanks! I'll test this branch soonish.

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]>

#464) Signed-off-by: OneZero-Y <[email protected]>

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

* refactor: Implement modular candle-binding architecture (#254) - Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/) - Add unified error handling and configuration loading systems - Implement dual-path architecture for traditional and LoRA models - Add comprehensive FFI layer with memory safety Maintains backward compatibility while enabling future model integrations. refactor: Implement modular candle-binding architecture - Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/) - Add unified error handling and configuration loading systems - Implement dual-path architecture for traditional and LoRA models - Add comprehensive FFI layer with memory safety Maintains backward compatibility while enabling future model integrations. Signed-off-by: OneZero-Y <[email protected]> * feat:unit tests for candle refactoring (#296) feat:unit tests for candle refactoring feat:unit tests for candle refactoring Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) (#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * fix:Improve rust unit test and optimize concurrent tests with rayon (#471) - Add 6 new unit test files - Replace std::thread::spawn with rayon::par_iter Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * fix: resolve syntax errors after rebase Signed-off-by: Huamin Chen <[email protected]> * add additional update Signed-off-by: Huamin Chen <[email protected]> * Change label count params to c_int (#494) Signed-off-by: carlory <[email protected]> * update embedding setting in config (#489) Signed-off-by: Huamin Chen <[email protected]> * make CUDA and Flash Attention 2 optional features (#511) Signed-off-by: OneZero-Y <[email protected]> * fix: Fix duplicate UNIFIED_CLASSIFIER definition and optimize lock contention (#516) - Remove duplicate UNIFIED_CLASSIFIER global state - Optimize PARALLEL_LORA_ENGINE lock contention by using Arc clone Signed-off-by: OneZero-Y <[email protected]> * Merge main to candle refactoring (#523) * Update test description from Math to General (#483) Signed-off-by: carlory <[email protected]> * feat: add HuggingChat support (#477) * add chat ui to dashboard and docker compose & refactor dashboard/backend/ Signed-off-by: JaredforReal <[email protected]> * try fix network error Signed-off-by: JaredforReal <[email protected]> * more --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: bitliu <[email protected]> * project: 2025 Q4 roadmap (#487) * project: q4 roadmap * project: q4 roadmap * project: q4 roadmap * more * more * more * more * feat: add shelleck precommit hook (#488) * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> * project: add q4 roadmap news (#495) * fix missing shellcheck in pre-commit image (#497) Signed-off-by: carlory <[email protected]> * infra: update tools (#501) Signed-off-by: yuluo-yx <[email protected]> * feat(demo): enhance OpenShift demo scripts with improved UX (#478) - Reduce model selection test to 4 categories (2×Model-A, 2×Model-B) - Add new "Classification Examples" option calling curl-examples.sh - Update reasoning examples to avoid cache hits from previous tests - Remove benign examples from PII and Jailbreak tests (show only attacks) - Enhance live-semantic-router-logs.sh with better color visibility: - Fix duplicate "WITH SCORE" text in classification output - Fix CACHE HIT background color extending over timestamp - Distinguish reasoning enabled vs disabled messages - Remove redundant "(standard routing)" text - Add background colors for Model-A/Model-B routing display These improvements make the live demo clearer and more impactful for presentations and demonstrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> * fix: fix precommit Argument list too long error (#502) Signed-off-by: yuluo-yx <[email protected]> * feat: enforce milvus dial timeout if set (#503) Signed-off-by: cryo <[email protected]> * Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506) * Initial plan * Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> * Allow semantic cache similarity threshold to be set at the category level (#493) * Initial plan * Add category-level cache settings: enabled and similarity_threshold Co-authored-by: rootfs <[email protected]> * Add comprehensive tests for category-level cache settings Co-authored-by: rootfs <[email protected]> * Update config files and documentation for category-level cache settings - Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings - Added comprehensive documentation section explaining category-level cache configuration - Updated semantic cache overview and in-memory cache docs with category-level examples - Added best practices for threshold selection and privacy considerations Co-authored-by: rootfs <[email protected]> * Remove duplicate code in FindSimilar functions Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. Co-authored-by: rootfs <[email protected]> * Update src/semantic-router/pkg/extproc/request_handler.go Co-authored-by: Copilot <[email protected]> * Revert changes from unsigned commit ae39fe2 Restored the classificationText empty check that was removed in the previous commit. Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]> * Allow jailbreak detection and threshold to be configured at the category level (#508) * Initial plan * Add category-level jailbreak detection configuration Co-authored-by: Xunzhuo <[email protected]> * Add documentation for category-level jailbreak settings Co-authored-by: Xunzhuo <[email protected]> * Update documentation for category-level jailbreak detection - Add category-level jailbreak configuration to jailbreak-protection.md - Update category configuration docs with jailbreak_enabled parameter - Add security-focused configuration example - Update global configuration docs with category override notes - Update README to mention fine-grained security control Co-authored-by: Xunzhuo <[email protected]> * Add category-level jailbreak threshold configuration - Add JailbreakThreshold field to Category struct - Add GetJailbreakThresholdForCategory helper method - Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods - Update performSecurityChecks to use category-specific threshold - Add 5 comprehensive tests for threshold configuration - Update example configs with threshold tuning examples - Update documentation with threshold configuration and tuning guidelines - Add threshold tuning guide with recommendations for different category types Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Allow PII detection threshold to be set at the category level (#510) * Initial plan * Add category-level PII threshold support Co-authored-by: Xunzhuo <[email protected]> * Update documentation with API integration notes Co-authored-by: Xunzhuo <[email protected]> * Fix markdown linting issues Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Fix: The caller information points to the wrapper function instead of the actual call location (#518) Signed-off-by: carlory <[email protected]> * feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504) * feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store Signed-off-by: Huamin Chen <[email protected]> * chore: run go mod tidy to clean up module dependencies Signed-off-by: Huamin Chen <[email protected]> * conditionally build candle cuda support Signed-off-by: Huamin Chen <[email protected]> * rebuild index upon restart Signed-off-by: Huamin Chen <[email protected]> * precommit fix Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * disable cuda build on ci Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Candle refactoring to main (#524) * Update test description from Math to General (#483) Signed-off-by: carlory <[email protected]> * feat: add HuggingChat support (#477) * add chat ui to dashboard and docker compose & refactor dashboard/backend/ Signed-off-by: JaredforReal <[email protected]> * try fix network error Signed-off-by: JaredforReal <[email protected]> * more --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: bitliu <[email protected]> * project: 2025 Q4 roadmap (#487) * project: q4 roadmap * project: q4 roadmap * project: q4 roadmap * more * more * more * more * feat: add shelleck precommit hook (#488) * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> * project: add q4 roadmap news (#495) * fix missing shellcheck in pre-commit image (#497) Signed-off-by: carlory <[email protected]> * infra: update tools (#501) Signed-off-by: yuluo-yx <[email protected]> * feat(demo): enhance OpenShift demo scripts with improved UX (#478) - Reduce model selection test to 4 categories (2×Model-A, 2×Model-B) - Add new "Classification Examples" option calling curl-examples.sh - Update reasoning examples to avoid cache hits from previous tests - Remove benign examples from PII and Jailbreak tests (show only attacks) - Enhance live-semantic-router-logs.sh with better color visibility: - Fix duplicate "WITH SCORE" text in classification output - Fix CACHE HIT background color extending over timestamp - Distinguish reasoning enabled vs disabled messages - Remove redundant "(standard routing)" text - Add background colors for Model-A/Model-B routing display These improvements make the live demo clearer and more impactful for presentations and demonstrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> * fix: fix precommit Argument list too long error (#502) Signed-off-by: yuluo-yx <[email protected]> * feat: enforce milvus dial timeout if set (#503) Signed-off-by: cryo <[email protected]> * Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506) * Initial plan * Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> * Allow semantic cache similarity threshold to be set at the category level (#493) * Initial plan * Add category-level cache settings: enabled and similarity_threshold Co-authored-by: rootfs <[email protected]> * Add comprehensive tests for category-level cache settings Co-authored-by: rootfs <[email protected]> * Update config files and documentation for category-level cache settings - Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings - Added comprehensive documentation section explaining category-level cache configuration - Updated semantic cache overview and in-memory cache docs with category-level examples - Added best practices for threshold selection and privacy considerations Co-authored-by: rootfs <[email protected]> * Remove duplicate code in FindSimilar functions Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. Co-authored-by: rootfs <[email protected]> * Update src/semantic-router/pkg/extproc/request_handler.go Co-authored-by: Copilot <[email protected]> * Revert changes from unsigned commit ae39fe2 Restored the classificationText empty check that was removed in the previous commit. Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]> * Allow jailbreak detection and threshold to be configured at the category level (#508) * Initial plan * Add category-level jailbreak detection configuration Co-authored-by: Xunzhuo <[email protected]> * Add documentation for category-level jailbreak settings Co-authored-by: Xunzhuo <[email protected]> * Update documentation for category-level jailbreak detection - Add category-level jailbreak configuration to jailbreak-protection.md - Update category configuration docs with jailbreak_enabled parameter - Add security-focused configuration example - Update global configuration docs with category override notes - Update README to mention fine-grained security control Co-authored-by: Xunzhuo <[email protected]> * Add category-level jailbreak threshold configuration - Add JailbreakThreshold field to Category struct - Add GetJailbreakThresholdForCategory helper method - Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods - Update performSecurityChecks to use category-specific threshold - Add 5 comprehensive tests for threshold configuration - Update example configs with threshold tuning examples - Update documentation with threshold configuration and tuning guidelines - Add threshold tuning guide with recommendations for different category types Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Allow PII detection threshold to be set at the category level (#510) * Initial plan * Add category-level PII threshold support Co-authored-by: Xunzhuo <[email protected]> * Update documentation with API integration notes Co-authored-by: Xunzhuo <[email protected]> * Fix markdown linting issues Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Fix: The caller information points to the wrapper function instead of the actual call location (#518) Signed-off-by: carlory <[email protected]> * feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504) * feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store Signed-off-by: Huamin Chen <[email protected]> * chore: run go mod tidy to clean up module dependencies Signed-off-by: Huamin Chen <[email protected]> * conditionally build candle cuda support Signed-off-by: Huamin Chen <[email protected]> * rebuild index upon restart Signed-off-by: Huamin Chen <[email protected]> * precommit fix Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * disable cuda build on ci Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Merge candle refactoring 3 (#525) * Update test description from Math to General (#483) Signed-off-by: carlory <[email protected]> * feat: add HuggingChat support (#477) * add chat ui to dashboard and docker compose & refactor dashboard/backend/ Signed-off-by: JaredforReal <[email protected]> * try fix network error Signed-off-by: JaredforReal <[email protected]> * more --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: bitliu <[email protected]> * project: 2025 Q4 roadmap (#487) * project: q4 roadmap * project: q4 roadmap * project: q4 roadmap * more * more * more * more * feat: add shelleck precommit hook (#488) * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> * project: add q4 roadmap news (#495) * fix missing shellcheck in pre-commit image (#497) Signed-off-by: carlory <[email protected]> * infra: update tools (#501) Signed-off-by: yuluo-yx <[email protected]> * feat(demo): enhance OpenShift demo scripts with improved UX (#478) - Reduce model selection test to 4 categories (2×Model-A, 2×Model-B) - Add new "Classification Examples" option calling curl-examples.sh - Update reasoning examples to avoid cache hits from previous tests - Remove benign examples from PII and Jailbreak tests (show only attacks) - Enhance live-semantic-router-logs.sh with better color visibility: - Fix duplicate "WITH SCORE" text in classification output - Fix CACHE HIT background color extending over timestamp - Distinguish reasoning enabled vs disabled messages - Remove redundant "(standard routing)" text - Add background colors for Model-A/Model-B routing display These improvements make the live demo clearer and more impactful for presentations and demonstrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> * fix: fix precommit Argument list too long error (#502) Signed-off-by: yuluo-yx <[email protected]> * feat: enforce milvus dial timeout if set (#503) Signed-off-by: cryo <[email protected]> * Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506) * Initial plan * Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> * Allow semantic cache similarity threshold to be set at the category level (#493) * Initial plan * Add category-level cache settings: enabled and similarity_threshold Co-authored-by: rootfs <[email protected]> * Add comprehensive tests for category-level cache settings Co-authored-by: rootfs <[email protected]> * Update config files and documentation for category-level cache settings - Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings - Added comprehensive documentation section explaining category-level cache configuration - Updated semantic cache overview and in-memory cache docs with category-level examples - Added best practices for threshold selection and privacy considerations Co-authored-by: rootfs <[email protected]> * Remove duplicate code in FindSimilar functions Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. Co-authored-by: rootfs <[email protected]> * Update src/semantic-router/pkg/extproc/request_handler.go Co-authored-by: Copilot <[email protected]> * Revert changes from unsigned commit ae39fe2 Restored the classificationText empty check that was removed in the previous commit. Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]> * Allow jailbreak detection and threshold to be configured at the category level (#508) * Initial plan * Add category-level jailbreak detection configuration Co-authored-by: Xunzhuo <[email protected]> * Add documentation for category-level jailbreak settings Co-authored-by: Xunzhuo <[email protected]> * Update documentation for category-level jailbreak detection - Add category-level jailbreak configuration to jailbreak-protection.md - Update category configuration docs with jailbreak_enabled parameter - Add security-focused configuration example - Update global configuration docs with category override notes - Update README to mention fine-grained security control Co-authored-by: Xunzhuo <[email protected]> * Add category-level jailbreak threshold configuration - Add JailbreakThreshold field to Category struct - Add GetJailbreakThresholdForCategory helper method - Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods - Update performSecurityChecks to use category-specific threshold - Add 5 comprehensive tests for threshold configuration - Update example configs with threshold tuning examples - Update documentation with threshold configuration and tuning guidelines - Add threshold tuning guide with recommendations for different category types Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Allow PII detection threshold to be set at the category level (#510) * Initial plan * Add category-level PII threshold support Co-authored-by: Xunzhuo <[email protected]> * Update documentation with API integration notes Co-authored-by: Xunzhuo <[email protected]> * Fix markdown linting issues Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Fix: The caller information points to the wrapper function instead of the actual call location (#518) Signed-off-by: carlory <[email protected]> * feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504) * feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store Signed-off-by: Huamin Chen <[email protected]> * chore: run go mod tidy to clean up module dependencies Signed-off-by: Huamin Chen <[email protected]> * conditionally build candle cuda support Signed-off-by: Huamin Chen <[email protected]> * rebuild index upon restart Signed-off-by: Huamin Chen <[email protected]> * precommit fix Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * disable cuda build on ci Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> * merge main to feat branch Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * chore: fix unit test (#527) * chore: fix unit test Signed-off-by: Huamin Chen <[email protected]> * fix go vet Signed-off-by: Huamin Chen <[email protected]> * fix ci Signed-off-by: Huamin Chen <[email protected]> * fix ci Signed-off-by: Huamin Chen <[email protected]> * split test-binding to two stages on ci Signed-off-by: Huamin Chen <[email protected]> * ignore test failure due to embeddinggemma restriction Signed-off-by: Huamin Chen <[email protected]> * reorder ci test sequences to avoid missing models Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> * refactor: Replace lazy_static with OnceLock for zero-cost concurrent reads based on review (#528) * refactor: Replace lazy_static with OnceLock for zero-cost concurrent reads based on review #266 (comment) Signed-off-by: Huamin Chen <[email protected]> * update tests Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> * chore: fix lint error (#530) Signed-off-by: Huamin Chen <[email protected]> * Fix lint error2 (#531) * chore: fix lint error Signed-off-by: Huamin Chen <[email protected]> * chore: fix lint error Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Co-authored-by: OneZero-Y <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]>

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers

caea0e9

Signed-off-by: OneZero-Y <[email protected]>

OneZero-Y requested review from Xunzhuo and rootfs as code owners October 17, 2025 09:21

github-actions bot assigned rootfs Oct 17, 2025

rootfs approved these changes Oct 17, 2025

View reviewed changes

rootfs merged commit e34204c into vllm-project:feat-candle-refactoring Oct 17, 2025
3 of 4 checks passed

OneZero-Y deleted the feat/testing-for-candle-refactoring branch October 18, 2025 06:56

rootfs pushed a commit to rootfs/semantic-router.bak that referenced this pull request Oct 19, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

9798e4e

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]>

rootfs pushed a commit that referenced this pull request Oct 19, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

325f6e8

#464) Signed-off-by: OneZero-Y <[email protected]>

rootfs pushed a commit to rootfs/semantic-router.bak that referenced this pull request Oct 19, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

2645350

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

rootfs pushed a commit to rootfs/semantic-router.bak that referenced this pull request Oct 19, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

e1ade49

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

rootfs pushed a commit to rootfs/semantic-router.bak that referenced this pull request Oct 19, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

87c1030

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

rootfs pushed a commit to rootfs/semantic-router.bak that referenced this pull request Oct 23, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

cf3bbf8

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

rootfs pushed a commit to rootfs/semantic-router.bak that referenced this pull request Oct 23, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

c48f51e

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

rootfs pushed a commit to rootfs/semantic-router.bak that referenced this pull request Oct 23, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

ec44f9a

vllm-project#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

rootfs pushed a commit that referenced this pull request Oct 23, 2025

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (

550f093

#464) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

Uh oh!

OneZero-Y commented Oct 17, 2025

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

rootfs commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

Uh oh!

Conversation

OneZero-Y commented Oct 17, 2025

What's Changed

1. Simplified parallel_engine.rs Thread Handling

2. Parallelized Batch Processing in LoRA Classifiers

3. Multi-Task Batch Classification Parallelization

4. Traditional Model Batch Forward Pass Parallelization

Testing

New Test

Uh oh!

github-actions bot commented Oct 17, 2025

👥 vLLM Semantic Team Notification

📁 candle-binding

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Simplified `parallel_engine.rs` Thread Handling

📁 `candle-binding`