1.3.1: Intel® AI for Enterprise RAG - patch release

kkurzacz-intel released this 18 Jul 14:11

· 20 commits to main since this release

c08d914

Release Notes

Highlights:

Enhanced model support with six additional LLMs including Meta-Llama-3.1, Qwen3, and Mistral variants
Upgraded vLLM version to 0.9.2
Expanded testing capabilities with pubMed dataset support and fixes for e2e performance tests

Publications:

Deploying Scalable Enterprise RAG on Kubernetes with Ansible Automation - Intel Community

Detailed Changes

AI/Development

Added support for the following models:
- hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
- meta-llama/Llama-3.1-8B-Instruct
- Qwen/Qwen3-14B-AWQ
- Qwen/Qwen3-14B
- solidrust/Mistral-7B-Instruct-v0.3-AWQ
- mistralai/Mistral-7B-Instruct-v0.3
Upgraded vLLM version to 0.9.2
Updated default resources for the standard redis and text-splitter microservice to avoid OOM errors
Added support for custom templates in resources-model-cpu.yaml
Added support for pubMed dataset and fixed input token length in e2e performance tests
Added "Performance Tuning Guide" for Xeon deployment

Known issues

For Qwen models, it's possible to see artifact in the response.

Assets 2