feat(mcp): add Prompts API and ragflow_list_datasets tool for intelligent retrieval routing by Louisym · Pull Request #13440 · infiniflow/ragflow

Louisym · 2026-03-06T08:05:26Z

Summary

This PR makes the RAGFlow MCP Server "white-box" to connected LLM clients by implementing
the MCP Prompts specification and adding a dataset discovery tool.

Before: clients saw a single opaque ragflow_retrieval tool with no guidance on which
knowledge bases exist, how to tune parameters, or how to recover from failures.

After: clients can request a live SOP prompt (ragflow_retrieval_skill) that injects:

A real-time snapshot of all available knowledge bases (id, name, description, doc/chunk counts)
Parameter tuning guide for RAGFlow's hybrid retrieval (Dense + BM25 + RRF)
Routing decision rules (precise KB targeting vs. global fallback)
Self-healing rules (empty results → lower threshold; 401 → check API key; 500 → retry once)

Changes

`mcp/server/server.py`

New tool: ragflow_list_datasets — lets the LLM proactively refresh the KB list and
get full metadata (name, document_count, chunk_count, embedding_model, language)
New: @app.list_prompts() — exposes ragflow_retrieval_skill prompt with optional
intent argument (precise / broad / auto)
New: @app.get_prompt() — dynamically assembles a 4-module SOP prompt with a live
KB snapshot on every call
Refactor: list_datasets() — extended return fields and extracted shared helpers
to eliminate code duplication; also populates the dataset metadata cache as a side effect

`test/benchmark/mcp_skill_eval.py` (new)

A/B benchmark script comparing global search (baseline) vs. precise KB routing (with Skill)
Integrates ragas (context_precision, context_recall) with OpenAI gpt-4o as judge LLM
Reuses existing test/benchmark/ infrastructure (HttpClient, dataset helpers, metrics)

`test/benchmark/wixqa_eval.py` (new)

WixQA-based benchmark using Wix/WixQA
(200 expert-written + 200 simulated customer support Q&A pairs)
Two-KB routing strategy: questions routed to the KB owning their source articles

Benchmark Results

Evaluated on 100 questions from Wix/WixQA
across 2 knowledge bases. Judged by gpt-4o via ragas.

Metric	Baseline (Both KBs)	With Routing	Delta
Context Precision	0.543	0.554	+2.0%
Context Recall	0.287	0.300	+4.5%
Avg Latency (ms)	403.1	366.2	-9.2%
Errors	0	0	—

Latency percentiles:

Stat	Baseline	Routing
p50	357.0 ms	338.4 ms
p90	516.4 ms	479.1 ms
p95	632.9 ms	531.2 ms

Routing consistently reduces noise (precision ↑), improves information coverage (recall ↑),
and cuts latency at every percentile by eliminating unnecessary KB searches.

Note: gains are intentionally conservative — the two WixQA KBs share the same domain
(Wix support articles), making routing harder than typical multi-domain deployments.
In heterogeneous KB setups (e.g., HR + Legal + Engineering), the precision delta is
expected to be significantly larger.

Test Plan

Start MCP Server in self-host mode, connect via MCP Inspector
Verify list_prompts returns ragflow_retrieval_skill with intent argument
Verify get_prompt returns a prompt containing current KB list
Verify intent=precise vs intent=broad produces different threshold guidance
Verify ragflow_list_datasets tool returns full metadata JSON array
Run benchmark: OPENAI_API_KEY=xxx uv run python test/benchmark/wixqa_eval.py --base-url http://127.0.0.1:9380 --api-key ragflow-xxx

…gent retrieval routing Add MCP Prompts support to the RAGFlow MCP server, enabling LLM clients to white-box the retrieval process via standardized list_prompts and get_prompt handlers. Changes: - mcp/server/server.py: implement list_prompts and get_prompt handlers exposing a ragflow_retrieval_skill prompt that assembles a 4-step SOP (dataset listing -> intent analysis -> retrieval -> answer synthesis) with dynamic dataset injection via ragflow_list_datasets tool call; add ragflow_list_datasets tool returning full KB metadata - test/__init__.py: make test/ a proper Python package - test/benchmark/mcp_skill_eval.py: A/B benchmark comparing global search (baseline) vs MCP-skill-guided routing using ragas metrics - test/benchmark/wixqa_eval.py: WixQA-based benchmark (Wix/WixQA on HuggingFace, 100 questions across 2 KBs); routing improves context precision +2.0%, recall +4.5%, and reduces avg latency by 9.2% - wixqa_benchmark_report.md: benchmark results report

…gent retrieval routing - Implement list_prompts / get_prompt handlers exposing ragflow_retrieval_skill prompt - Dynamically inject live KB snapshot, parameter tuning guide, routing rules, and self-healing rules - Add ragflow_list_datasets tool for LLM-driven KB discovery - Refactor list_datasets() with shared _fetch_datasets_raw() + _normalize_dataset_item() helpers - Add ragas>=0.2.0 and openai>=1.0.0 to test dependency group - Add test/benchmark/mcp_skill_eval.py: A/B benchmark (global search vs precise routing) evaluated on WixQA dataset with gpt-4o as ragas judge

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. 🌈 python Pull requests that update Python code 💞 feature Feature request, pull request that fullfill a new feature. labels Mar 6, 2026

Louisym added 3 commits March 10, 2026 18:34

chore: add empty test/__init__.py to match upstream main

51a870a

Merge branch 'main' into my-branch

d46ddcd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): add Prompts API and ragflow_list_datasets tool for intelligent retrieval routing#13440

feat(mcp): add Prompts API and ragflow_list_datasets tool for intelligent retrieval routing#13440
Louisym wants to merge 4 commits intoinfiniflow:mainfrom
Louisym:my-branch

Louisym commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Louisym commented Mar 6, 2026

Summary

Changes

mcp/server/server.py

test/benchmark/mcp_skill_eval.py (new)

test/benchmark/wixqa_eval.py (new)

Benchmark Results

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`mcp/server/server.py`

`test/benchmark/mcp_skill_eval.py` (new)

`test/benchmark/wixqa_eval.py` (new)