Performance benchmark for ragflow #1500

bijouvj · 2024-07-12T22:12:50Z

bijouvj
Jul 12, 2024

Hi! I was wondering how I can a benchmark the performance of ragflow for a given deployment in terms of tokens/sec for given document sizes? Has anyone done something like this? Any pointers of help will be much appreciated.

KevinHuSh · 2024-07-14T06:54:06Z

KevinHuSh
Jul 14, 2024
Maintainer

Do you mean parsing speed in terms of token per second?
Please refer to rag/app/*.py. They are all about parsing.

0 replies

xXMrNidaXx · 2026-02-23T12:56:35Z

xXMrNidaXx
Feb 23, 2026

Great question on benchmarking! Here are some pointers from our experience at RevolutionAI (https://revolutionai.io):

Key metrics to track:

Time-to-first-token (TTFT) - crucial for user experience
Tokens/sec throughput
Retrieval latency vs. generation latency (break these down separately)

Tools we have used:

LangSmith for tracing individual pipeline stages
Custom timing wrappers around retrieval + LLM calls
Load testing with Locust for concurrent users

Performance tip: If you are seeing slow TTFT, look into caching strategies. There is a recent paper called RAGCache that shows 4x TTFT improvement by caching intermediate embedding states. The key insight is storing retrieved knowledge in a tree structure across GPU/host memory.

For document size impact, we typically see:

<10 pages: minimal latency impact
10-100 pages: ~20-30% retrieval overhead
100+ pages: consider chunking strategies carefully

Happy to share more details if helpful!

0 replies

xXMrNidaXx · 2026-02-23T16:17:12Z

xXMrNidaXx
Feb 23, 2026

Performance benchmarks are essential! At RevolutionAI (https://revolutionai.io) we benchmark RAG systems extensively.

Key metrics:

Metric	Target	Measure
Indexing	<1s/doc	Time to add document
Query latency	<500ms	End-to-end retrieval
Recall@10	>0.85	Relevant docs found
QPS	>100	Queries per second

Benchmark setup:

import time

def benchmark_query(ragflow, queries, k=10):
    latencies = []
    for q in queries:
        start = time.time()
        results = ragflow.search(q, top_k=k)
        latencies.append(time.time() - start)
    return {
        "p50": np.percentile(latencies, 50),
        "p99": np.percentile(latencies, 99),
        "qps": len(queries) / sum(latencies)
    }

Variables to test:

Chunk size impact
Embedding model choice
Vector DB performance
Reranking overhead

Would love to see official benchmarks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfiniFlow

Performance benchmark for ragflow #1500

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

InfiniFlow

Performance benchmark for ragflow #1500

Uh oh!

bijouvj Jul 12, 2024

Replies: 3 comments

Uh oh!

KevinHuSh Jul 14, 2024 Maintainer

Uh oh!

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

bijouvj
Jul 12, 2024

KevinHuSh
Jul 14, 2024
Maintainer

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026