12 Mar 12:56

kkurzacz-intel

79e80f4

2.1.1: Intel® AI for Enterprise RAG - patch release Latest

Latest

Getting Started

To deploy the Intel® AI for Enterprise RAG application, follow the instructions.

Highlights

Enhanced Reliability of NetApp Trident Integration: Improved installation and cleanup processes for ONTAP deployment, ensuring consistent creation of backup objects and better stability when managing Trident drivers
New Reverse Proxy Support for File Uploads: Introduced the reverse_proxy_storage option in config.yaml with automation enabling file uploads directly through the web interface with ONTAP deployment.
TDX Validation Complete for All Enterprise RAG 2.1 Solutions: Validated TDX compatibility for ChatQnA, AudioQnA, and DocSum, with documentation updates included
Improved Nutanix Documentation: Enhanced clarity and completeness of guidance under docs/nutanix.
Critical Vulnerabilities Resolved

Publications

Detailed changes

Deployment:

fixed creation of backup object when installing trident driver
added tests that verifies if there is connectivity from worker nodes to ONTAP data and managemet lif
corrected cleanup when removing netapp trident drivers
added reverse_proxy_storage option in config.yaml and add automation that allows uploading files via web brower
validated TDX with Enterprise RAG 2.1 for all the solutions (Chatqna, Audioqna, Docsum) and updated the documentation
improved the documentation in docs/nutanix
aligned default embedding model server to vllm in a pipeline with external endpoint

Assets 2

19 Feb 15:54

kkurzacz-intel

release-2.1.0

4981c25

2.1.0: Intel® AI for Enterprise RAG

Getting Started

To deploy the Intel® AI for Enterprise RAG application, follow the instructions.

Highlights:

New solution integrated! You can use AudioQnA pipeline to transcript audio prompt, read the Chatbot output, ingest audio data, and utilize a dedicated UI.
New default embedding model server: VLLM.
New storage layer: MinIO was replaced with Seaweed as a primary file‑storage backend.
Extended Upgradability: full version tracking, automatic upgrade detection, and unified version metadata.
Improved UI: chat pinning & search, bulk ingestion actions, DocSum strategy selection, and reduced bundle size via validation/markdown library refactors.
Text Extractor upgrades: better PDF parsing, deeper PPTX/DOCX extraction, and audio file support.
Document processing performance boosts: faster embedding with Celery parallelization and faster uploads with new upload‑optimized mode.

Detailed changes

AI / Development

New AudioQnA Solution

ASR microservice using vLLM model server for transcription.
TTS microservice built with FastAPI, enabling audio responses.
Namespace Status Watcher microservice for AudioQnA health validation.
Text Extractor extended to parse MP3/WAV for transcription.

New embedding model server: VLLM

vLLM Embedding now the default embedding backend.
Removed LLM_OPENAI_FORMAT_STREAMING & LLM_CONNECTOR; only OpenAI‑style streaming is supported now.

Enhanced Dataprep Pipeline improvements

PDF text quality boosted via pymupdf4llm.
Improved data extraction of PPT/PPTX/DOC/DOCX: full extraction of comments, SmartArt, notes, diagrams, embedded Excel sheets.
Added MP3/WAV ingestion (AudioQnA only).
[preview] MS SQL Server 2025 added as a new alternative Vector Database

Upgradability & Versioning

Full deployment lifecycle tracked via ConfigMap.
Automatic detection of upgrade vs install vs refresh.
Prevents unsupported downgrades or mismatched deployments.
Unified version source at deployment/version.yaml.
Improved update_charts.py for automated chart and pyproject.toml updates.

Additional Features

Cancellation wrapper in microservices (TTS, DocSum) to stop processing when user aborts.
Accuracy Evaluator
- Added query‑type filtering.
- Bucket‑based filtering for simulating accuracy across bucket distributions.

Deployment

Storage Layer Update

Replaced MinIO with SeaweedFS as the primary file‑storage backend.
Added a fully managed Ansible deployment workflow for SeaweedFS.
Updated EDP/UI logic to support SeaweedFS Advanced IAM using Bearer Token authentication, provided as an alternative to MinIO’s authentication model.

Document Processing Pipeline Enhancements

Optimized the document embedding and ingestion workflow by replacing sequential batch processing with a high‑throughput parallel pipeline architecture.
Improved overall performance and increased utilization of embedding/ingestion services.

[preview] Document Upload Mode

This release introduces a new mechanism allowing the system to switch between two pipeline modes:
- ChatQnA Mode
  - Standard operational mode enabling chat interactions.
  - Full access to chat UI and EDP pipeline with default resource allocation.
- Document Upload Mode (Enhanced EDP Resource Allocation)
  - Pipeline switches to a document‑upload–optimized configuration, allocating more resources to EDP components.
  - Chat UI is automatically disabled to ensure system capacity and stability.
  - Admin Area remains fully accessible, allowing operations, monitoring, and management tasks.

Additional Features

Updated infrastructure automation scripts to support new Kubernetes versions
- Tested Kubernetes versions: 1.32.9 1.33.5
Updated Gaudi stack to: 1.22.2-32
NRI Balloons Controller - A Kubernetes mutating webhook was added that ensures selected pods wait for the NRI balloons DaemonSet to be ready.

User Interface

AudioQnA Application

Introduced a standalone UI application for the AudioQnA pipeline.
- Users can record messages using microphone input.
- Users can play back a single response message (playback only; no pause functionality).
- Control Plane view now displays additional statuses for the Audio Speech Recognition and Text‑to‑Speech microservices.
- Added support for uploading audio files via Data Ingestion (MP3 and WAV formats).

Additional Features

Chat History now allows users to pin selected items to the top of the list.
A search bar has been added to Chat History, enabling users to quickly find specific items.
The Control Plane side panel can now be shown or hidden using a dedicated toggle button.
Users can now perform Retry or Delete actions on multiple selected files and links within the Data Ingestion view.
In the DocSum UI application, users can select a specific strategy before generating a summary.

Refactors

Replaced the yup library with zod for input validation.
Replaced the react-markdown library with marked for Markdown parsing.
UI image build process has been optimized by removing redundant steps.

Telemetry

VLLM Dashboard in Grafana improved
Dashboard for AudioQnA solution added

Known issues

The default embedding model server has been updated to vLLM. However, late chunking is currently supported only when using TorchServe. vLLM does not support late chunking at this time.
Late Chunking with similarity_search_with_siblings may exceed context. Using late chunking with search_type="similarity_search_with_siblings" may cause context overflow. It is recommended to use late chunking with the default search type, which does not include neighboring chunks.
When playing back audio via Text‑to‑Speech (TTS) in the UI, the Stop action is not functional. Playback can only be interrupted by refreshing the page.
Processing large TTS requests may cause the service to crash. A permanent fix is planned for a future release.

Assets 2

18 Dec 18:36

kkurzacz-intel

release-2.0.1

1f218b2

2.0.1: Intel® AI for Enterprise RAG

Getting Started

To deploy the Intel® AI for Enterprise RAG application, follow the instructions.

Highlights:

Late Chunking Enhancements: Improved document ingestion performance in late chunking mode by up to 6×, ensured chunk alignment with original text for better accuracy, and added full telemetry for the late chunking microservice and its logs in Grafana.
Two new publications on Enterprise RAG:
- Securing Enterprise RAG Deployments
- Document Summarization Pipeline
vLLM CPU updated: Version updated to v0.11.2

Detailed changes

AI / Development

Improved late chunking ingestion performance by reducing TorchServe serialization overhead, resulting in up to 6× faster document ingestion.
Updated chunk extraction in late chunking mode to pull text directly from the original document, improving accuracy and consistency.
Upgraded vLLM CPU to v0.11.2 and added high-concurrency handling by increasing connection limits and keep-alive duration for long-running requests

Deployment

Improved file upload speed by:
- Dynamically calculating Celery BATCH_SIZE.
- Increasing resources for the extractor pod.
Upgradability Improvements:
- Implemented version tracking for all ERAG components and enabled UI to display deployment version.
- Added a post-upgrade integrity check to verify data retention. (Currently requires manual execution).
- Added a pre-upgrade health check to ensure upgrades occur on healthy deployments. (Currently requires manual execution).

User Interface

ChatQnA

Chat

Enhanced chat conversation feed UX:
- Chat feed no longer auto-scrolls when a historical chat item is selected; the conversation now starts from the beginning. A Scroll to Bottom button is available for quick navigation.
- When a user sends a new message, the chat scrolls down instantly, ensuring the message is visible at the top of the feed. Remaining space is preserved for streamed responses.
- Chat feed no longer scrolls automatically during response streaming.
- These changes also resolve issues where users could not scroll up during long streamed answers.
Fixed an issue with Chat History streaming across different chats. All history items and related data (messages, sources, user input, etc.) are now stored separately, preventing previous conflicts.

Admin Panel

Fixed an issue where the vLLM service node in the Admin Panel’s Control Plane tab was sporadically marked red. StatefulSet state is now interpreted correctly.
Fixed an issue where email addresses and URLs enclosed in angle brackets (e.g., email@example.com) were removed from UI output.

Telemetry

Extended Redis telemetry probe timeouts to improve stability.
Added late chunking microservice metrics and logs, visible in Grafana.
Extended the Accuracy Evaluator with configurable paths for setup configuration and cluster credentials, allowing non-default locations.

Known issues

Late Chunking with similarity_search_with_siblings may exceed context. Using late chunking with search_type="similarity_search_with_siblings" may cause context overflow. It is recommended to use late chunking with the default search type, which does not include neighboring chunks.
Empty refferences and source indexes. Chatbot randomly provide answers with empty references and source indexes when casperhansen/llama-3-8b-instruct-awq LLM model is used.

Assets 2

20 Nov 15:39

kkurzacz-intel

release-2.0.0

5e1ee69

2.0.0: Intel® AI for Enterprise RAG

Getting Started

To deploy your Intel® AI for Enterprise RAG application, please follow the instructions.

Highlights:

New use case added! You can now use Intel® AI for Enterprise RAG Document Summarization with separate pipeline and UI for text and file-based summaries.
Replaced Bitnami images with custom Helm charts for Redis, MongoDB, Postgres, Apisix, and Keycloak to limit third-party dependencies.
Added automated balloon sizing and reboot-survivability features (Istio streamlining, RAG refresh CronJob) to maximize hardware utilization and improve automatic recovery.
Added Active Directory support for enterprise authentication.
Enabled external inference endpoint support for flexible hybrid deployments with remote LLM services.
Introduced PLLuM models with Polish prompt templates.

Detailed changes

AI / Development

Document Summarization pipeline integrated
Added Active Directory support for seamless integration with enterprise applications
Added support for external inference endpoint for VLLM
PLLuM models were integrated into the pipeline, together with automatic support for polish prompt templates
[preview] Introduced Late Chunking as a preview feature, an advanced text-processing technique that improves embedding quality by preserving more semantic context across chunk boundaries
Added an fallback option for generating presignedUrls if the storage endpoint is not configured or not capable of token credential validation
LoadPdf in Text Extractor parallized
Made HF_TOKEN optional - if model is not a gated/restricted one, you don't need to pass a HF_TOKEN now
Align LLM microservice with OpenAI API – LLM microservice can now be easily used in third party chains and pipelines
VLLM HPU updated to v0.9.0.1+Gaudi-1.22.0
Added docs/accuracy_tuning_tips.md with guidance for tuning accuracy with Late Chunking and other techniques
Added src/comps/vectorstores/CONTRIBUTING.md with instructions on how to enabled a new vector database to the pipeline

Deployment

Replaced Bitnami images and Helm charts with self created solutions for:
- Redis (vdb)
- Fingerprint and Chat history (MongoDB)
- EDP (PostgreSQL)
- Apisix
- Keycloak
Added automated calculation of balloon sizes.
Added balloons for torchserve-embedding component
TDX with One TD approach has been moved to production ready feature
[preview] Created an installer.sh script that allows to deploy entire solution on pre-configured software
A series of features have been added for the pipeline to survive the reboot of the cluster:
- Istio streamlined – Istio is being applied at the beginning of deployment now
- Added a CronJob rag-watcher to refresh RAG services after node reboot, ensuring clean startup and operation
Upgradability:
- Metadata pre-upgrade verification implemented – will compare metadata available in deployed pipeline with ones coming with an upgrade
- Data consistency report added - report volume of user data in components of deployed pipeline

User Interface

Document Summarization UI added
- Users can summarize plain text or content from document file (supported file extensions: DOC, DOCX, PDF, MD).
- Generated summaries are stored in client-side history (retained until the page is refreshed or the session ends).
- Admin Panel Tabs:
  - Control Plane – Displays pipeline status.
  - Telemetry & Authentication – Provides links to Grafana and Keycloak.
ChatQnA UI - Admin Panel: Added support for filtering and sorting columns in data tables within the Data Ingestion tab.

Telemetry

Introduced a new enabled flag for telemetry traces, allowing users to control whether traces are deployed (default: false)
Migrated the OpenTelemetry Collector base image from Ubuntu to Debian
Upgraded telemetry components, including Grafana and associated Helm charts
Updated instructions and behavior for accessing logs in Grafana's Explore view, reflecting changes in newer Grafana versions
Added new monitor for the Docsum pipeline

Known issues

A regression in performance was observed during data ingestion in Enhanced Dataprep Pipeline. Currently, the pipeline is optimized for the chat, which can slow down file uploads. If you have a lot to upload, consider a workaround: install the pipeline with balloons.enabled:False -> HPA will scale the embeddings. After uploading the files, install-on-install with balloons.enabled:True for best chat performance.
It was observed that telemetry tracing might fail sporadically during deployment. That's why tracing was disabled at the moment.
When telemetry tracing is enabled, only one component's spans are visible in Tempo. Expected behavior is to see spans for all eRAG microservices in the distributed trace.
During late chunking, text decoding performed by the tokenizer introduces formatting changes compared to the original source (e.g., lowercase conversion, added separators). As a result, retrieved chunks may not fully match the original document.
For ChatQnA pipeline in Admin Panel's Control Plane tab vLLM service node may be sporadically colored with red as Not ready state is read from API for its StatefulSet state
Document Summarization drag and drop file upload doesn't work. Please use Browse Files.

Assets 2

23 Sep 13:31

kkurzacz-intel

release-1.5.0

a5db50a

1.5.0: Intel® AI for Enterprise RAG

Getting Started

To deploy your Chat Q&A RAG application, please follow the instructions.

Highlights:

Added EDP PostgreSQL migration strategy (default-enabled) for smoother upgrades
Included source chunk text in guardrail / LLM output payloads for better traceability
Simplified guardrails: system prompt template removed; only user prompt validated by default
Implemented automatic MinIO–Keycloak OIDC self-healing cron job
Added TorchServe balloon policies and Gaudi performance optimizations (incl. reranker pinning & auto vLLM scaling)
Replaced TEI reranker with TorchServe reranker for improved efficiency
Added Terraform scripts for AWS deployment plus configurable vector DB type & dimensions
Enhanced Chat UI: source chunk dialog, stable history saving, Firefox interrupt fix

Detailed changes

AI / Development

Implemented EDP database (PostgreSQL) migration strategy (enabled by default) to simplify upgrades
Included chunk text in source metadata (LLM / output guard responses now return chunk content)
Removed system prompt template from guardrails (only user prompt checked; reranked_docs and past answers still optional via Dataprep / output guardrails when enabled)
Implemented cron job to auto-verify and reconfigure MinIO OIDC linkage with Keycloak (fixes stale presigned URL issues without admin action)
Integrated latest GenAIComps core changes to accelerate microservice prototyping

Deployment

Implemented balloon policies for TorchServe on Gaudi
Implemented performance optimizations:
- Replaced TEI reranker with TorchServe reranker
- Added CPU pinning for TorchServe reranker
- Enabled automatic scaling of vLLM instances
Added Terraform scripts to deploy ERAG on AWS
Added configuration options for vector database type and vector dimensions to streamline embedding / reranker model changes

User Interface

Chat

Added clickable source buttons that open a dialog showing retrieved chunks used to generate the answer
Moved file download / external link actions to dialog footer (contextual buttons)
Fixed Firefox error handling when interrupting streamed responses
Set chat rename character limit to 250 (aligned with API constraint)
Refactored chat history saving: background /save call now avoids unnecessary UI refresh and screen blinking unless a non-guardrails error occurs

Admin Panel

Control Plane

Fixed sentiment scanner threshold argument range
Added input validation and tooltip for Code Scanner supported languages
Removed "Edit Service Arguments" button; "Confirm Changes" and "Cancel" now remain disabled until a modification is made

Data Ingestion

Updated Processing Time column to display "N/A" for Uploaded state or zero start time
Added UI performance optimizations to reduce unnecessary re-renders and screen blinking on data refetch

Telemetry

Renamed GMC router metrics prefix from "llm" to "router" for clarity
Added Grafana dashboards: E2E Time to First Token, E2E Pipeline Latency, Pre-LLM Pipeline Latency
Fixed log visibility issue in Grafana when deploying pipeline via Kubespray

Known issues

User can ask a question exceeding word limit, resulting in a general error
Random issue of chatbot not providing context-sensitive answer to a specific prompt although relevant content was provided
Post-install Gaudi operator installation fails in slow network conditions
Grafana Logs Drilldown fails with grafana-lokiexplore-app plugin version 1.0.27:
Opening view in Grafana "Explore → Logs → Show Logs" may crash with error: Error: Minified React error #130 .... This occurs with grafana-lokiexplore-app v1.0.27 (released 2025-09-17). To workaround, downgrade the plugin version to v1.0.26. To do that, edit the telemetry-grafana ConfigMap to pin version 1.0.26 (see screenshot below), then restart the monitoring/telemetry-grafana-xxx-xxx k8s pod for the change to take effect.

To verify, go to Grafana → Administration → Plugins and search for "Grafana Logs Drilldown" and confirm that the installed version is 1.0.26, as shown below.

Assets 2

14 Aug 16:58

kkurzacz-intel

release-1.4.0

bb83487

1.4.0: Intel® AI for Enterprise RAG

Getting Started

To deploy your Chat Q&A RAG application, please follow the instructions.

Highlights:

Major new features and improvements:

Chat History: Users can now save, rename, export, and delete chats.
Source Attribution in UI: RAG sources used in responses are now visible and downloadable.
Accuracy Evaluation: Integrated GenAIEvals scripts for RAG performance testing.
Multi-node Deployment Support: Includes node discovery and NUMA-aware vLLM sizing.
Velero Backup Integration: Automated backup/restore now optional(if enabled in config.yaml) part of cluster lifecycle.
Detailed Ingestion Timing: Users can inspect time breakdowns for each ingestion stage.
Large File Deletion Bug Fixed: Files with >10,000 chunks now fully deleted.

Detailed changes

AI/Development

Introduced Chat History: Endpoint details in src/comps/chat_history.
Ported Accuracy Evaluation scripts from OPEA's GenAIEvals to Enterprise RAG (src/tests/e2e/evals/evaluation/rag_eval).
RAG Source Attribution: UI now displays which ingested documents contributed to answers; files are downloadable.
Detailed EDP Timing: Clicking ingestion time reveals breakdown (text extraction, splitting, etc.).
Translation Pipeline (Preview): API-accessible, not yet in UI. Details in deployment/README.md#additional-pipelines.
Large File Deletion Fix: Files with >10,000 chunks now properly deleted.

Deployment

Added multi-node deployment support.
Introduced node discovery mechanism.
Created balloons policy and HPA support for torchserve-reranker.
Enabled NUMA-aware vLLM sizing and inventory-based configuration.
Moved PCV section and model definitions to inventory.
Automated NFS server installation in infrastructure.yaml post-install tasks.
Added automated backup/restore playbooks.
Moved Velero installation to infrastructure.
Added Terraform deployment for Gaudi 3 node on IBM Cloud.
Changed default to use HPA with balloons policy.

User Interface

Chat

Chats saved in left panel; users can rename, export (JSON), or delete.
If ingested data was used, sources appear below responses:
Links open in new tab.
Files are downloaded directly.

Admin Panel

Control Plane

Configurable services marked with cog icon; only these are clickable.

Data Ingestion

Clicking Processing Time shows stage durations:
- Standard: 00:00:06.239
- Compact: 6s 239ms
Auto-refresh every 10s until final status (Error, Ingested, etc.); toggleable in settings.
Bulk ingestion via .txt file: URLs separated by commas, spaces, or new lines.
Bucket Synchronization Dialog: Review and sync S3 discrepancies via UI.

Known issues

[API-only] Deleting >70 documents at once may result in incomplete deletion.
[input guards] After enabling input guards, using a forbidden word will cause the next three consecutive user queries to be blocked due to chat history enforcement (N+3)
[vllm-gaudi] When running Enterprise RAG on Gaudi with the default Mixtral 8x7B model, only a single HPU device will be utilized

Assets 2

24 Jul 18:10

mzyczyns

release-1.3.2

1d670b3

1.3.2: Intel® AI for Enterprise RAG - patch release

Release Notes

Detailed Changes

AI/Development

Fix for Header/Footer stripper in TextCompressor microservice
Enhanced documentation for Performance Tuning Tips

Known issues

For Qwen models, it's possible to see artifact in the response.

Assets 2

18 Jul 14:11

kkurzacz-intel

release-1.3.1

c08d914

1.3.1: Intel® AI for Enterprise RAG - patch release

Release Notes

Highlights:

Enhanced model support with six additional LLMs including Meta-Llama-3.1, Qwen3, and Mistral variants
Upgraded vLLM version to 0.9.2
Expanded testing capabilities with pubMed dataset support and fixes for e2e performance tests

Publications:

Deploying Scalable Enterprise RAG on Kubernetes with Ansible Automation - Intel Community

Detailed Changes

AI/Development

Added support for the following models:
- hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
- meta-llama/Llama-3.1-8B-Instruct
- Qwen/Qwen3-14B-AWQ
- Qwen/Qwen3-14B
- solidrust/Mistral-7B-Instruct-v0.3-AWQ
- mistralai/Mistral-7B-Instruct-v0.3
Upgraded vLLM version to 0.9.2
Updated default resources for the standard redis and text-splitter microservice to avoid OOM errors
Added support for custom templates in resources-model-cpu.yaml
Added support for pubMed dataset and fixed input token length in e2e performance tests
Added "Performance Tuning Guide" for Xeon deployment

Known issues

For Qwen models, it's possible to see artifact in the response.

Assets 2

30 Jun 17:13

kkurzacz-intel

release-1.3.0

2b05765

1.3.0: Intel® AI for Enterprise RAG

Getting Started

To deploy your Chat Q&A RAG application, please follow the instructions.

Highlights:

Retriever RBAC support: Document filtering based on user's access privileges to underlying S3 storage, enhancing security and data access control.
Enhanced text extraction: Improved extraction for PDF, DOC, DOCX, and images including better hyperlink, table, and image text processing.
Microservice architecture improvements: Split Dataprep into separate TextExtractor and TextSplitter services with new TextCompression microservice for cleaner document processing.
Advanced retrieval algorithms: Added similarity_search_with_siblings algorithm to improve response accuracy by including adjacent chunks.
Improved Redis implementation: Migrated to standalone namespace with Helm chart support for both single node and cluster setups for better performance.
Backup/restore functionality: Added Velero-based backup and restore capabilities for Keycloak, EDP, and vector store database.
UI Accessibility: Enhanced accessibility with React ARIA components and added syntax highlighting for code snippets.

Detailed changes

AI/Development

Added Retriever RBAC support - document filtering based on user's access privileges to underlying S3 storage.
Enhanced text extraction for PDF, DOC, DOCX, and images - improved hyperlink extraction, table text extraction, and image text extraction.
Migrated text extraction from custom loader classes to Markitdown for ADOC, TXT, JSON, JSONL, CSV, XLSX, XLS, HTML, MD, XML, and YAML file formats.
Introduced MarkdownSplitter for ADOC, MD, and HTML files to split text by sections and add this information to metadata.
Added filename/URL and Section information to prompt template, improving responses to questions about document names.
Split Dataprep microservice into separate TextExtractor and TextSplitter services.
Introduced TextCompression microservice between TextExtractor and TextSplitter to clean and compress document text. More details here.
Added similarity_search_with_siblings algorithm to retriever, configurable in Admin Panel, which improves response accuracy by including adjacent chunks.
Enabled semantic chunking in Ansible and debug feature, with fixes for large files.
Introduced Hierarchical Indexing for PDF files as an experimental feature, configurable via config.yaml. Learn more here.

User Interface

Improved accessibility by refactoring UI components with React ARIA.
Added syntax highlighting for code snippets in Chat.
Implemented automatic scaling of ChatQnA pipeline graph size in Admin Panel - Control Plane.

Deployment

Migrated Redis vector database from ChatQnA pipeline to standalone namespace.
Deployed Redis via Helm chart - supporting both single node Redis and Redis-cluster for improved performance.
Implemented balloons policy as an alternative method of pinning VLLM resources.
Created backup/restore functionality using Velero for Keycloak, EDP, and vector store database. Installation steps, update and restore procedure are described in documentation.
Added support for deployment under user-defined domain names.
Created Ansible scripts for simplified Kubernetes deployment.
Added Ansible scripts for deploying Gaudi via operator.

Security

Removed non-functional scanners from guardrails.
Enabled remaining input guardrails in UI.
Fixed and enhanced guardrails end-to-end tests.
Enabled fingerprint capability for dataprep guardrail.
Upgraded LLM Guard package to version 3.16.

Known issues

When using Redis as a vector database, the default resource settings are not optimized, causing Redis to start with configurations that are unsuitable for production environments or intensive testing. To address this, remove the existing resource and persistence node configurations from here. Update it with the following settings:

redis:
(...)
  master:
    persistence:
      enabled: true
      size: "10Gi"
    resources:
      requests:
        cpu: 2
        memory: 4Gi
      limits:
        cpu: 16
        memory: 16Gi
  replica:
    persistence:
      enabled: true
      size: "10Gi"
    resources:
      requests:
        cpu: 2
        memory: 4Gi
      limits:
        cpu: 16
        memory: 16Gi

Note: The resource configuration for redis-cluster is not affected and is correctly set up by default.

Assets 2

27 May 17:41

mzyczyns

release-1.2.1

6fcd645

1.2.1: Intel® AI for Enterprise RAG - patch release

Release Notes

Highlights:

Enhanced Performance: Improved hardware support with Habana Gaudi 1.21.0 and implemented core pinning for vLLM pods, resulting in better inference performance.
Optimized Model Deployment: Added pre-configured optimizations for LLM models and set a default quantized model (llama-3-8b-instruct-awq) for efficient CPU inference.
Improved Infrastructure Flexibility: Added support for user-defined domain names and S3-compatible storage backends, with smarter resource management that prevents unnecessary MinIO service activation.
Enhanced Data Processing: Improved Dataprep capabilities with extended link parsing for supported file types and added safeguards to prevent service hangs.
Extended Hardware Support: Added TDX support in deployment scripts and fixed installation paths for Gaudi-based deployments.

Detailed Changes

AI/Development

Update Habana Gaudi to 1.21.0
Dataprep - Enable parsing links that target files(only those extension that are already supported), not only html
Fix parsing no_proxy parameter in EDP
Add timeout to Dataprep microservices to avoid indefinite hangs
Fix sticky session for the generic connector in LLM microservice to enable load balancing for multiple replicas

Deployment

Created file with optimized configurations for running LLM models
Set casperhansen/llama-3-8b-instruct-awq as the default quantized model for CPU inference
Implemented core pinning mechanism for vLLM pods to improve performance
Enabled user-defined domain name configuration
Added support for TDX in Ansible deployment scripts
Documentation update - added detailed instructions on setting up S3 or S3-compatible storage as a backend in EDP
MinIO service is no longer started when a different storage backend (e.g., S3 or S3-compatible) is configured in EDP, preventing unnecessary resource usage
Resolved issue with incorrect file paths in install_chatqna.sh for Gaudi-based installations - the script now uses "hpu" as expected

Known issues

GMC can update variables passed in config maps or as environment variables. Scripts cannot update changes that don't apply to other objects.

Assets 2

Releases: opea-project/Enterprise-RAG

2.1.1: Intel® AI for Enterprise RAG - patch release

Getting Started

Highlights

Publications

Detailed changes

Deployment:

Uh oh!

2.1.0: Intel® AI for Enterprise RAG

Getting Started

Highlights:

Detailed changes

AI / Development

New AudioQnA Solution

New embedding model server: VLLM

Enhanced Dataprep Pipeline improvements

Upgradability & Versioning

Additional Features

Deployment

Storage Layer Update

Document Processing Pipeline Enhancements

[preview] Document Upload Mode

Additional Features

User Interface

AudioQnA Application

Additional Features

Refactors

Telemetry

Known issues

Uh oh!

2.0.1: Intel® AI for Enterprise RAG

Getting Started

Highlights:

Detailed changes

AI / Development

Deployment

User Interface

ChatQnA

Chat

Admin Panel

Telemetry

Known issues

Uh oh!

2.0.0: Intel® AI for Enterprise RAG

Getting Started

Highlights:

Detailed changes

AI / Development

Deployment

User Interface

Telemetry

Known issues

Uh oh!

1.5.0: Intel® AI for Enterprise RAG

Getting Started

Highlights:

Detailed changes

AI / Development

Deployment

User Interface

Chat

Admin Panel

Control Plane

Data Ingestion

Telemetry

Known issues

Uh oh!

1.4.0: Intel® AI for Enterprise RAG

Getting Started

Highlights:

Detailed changes

AI/Development

Deployment

User Interface

Chat

Admin Panel

Control Plane

Data Ingestion

Known issues

Uh oh!