Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(rag): update rag params #765

Merged
merged 1 commit into from
Mar 4, 2025
Merged

refactor(rag): update rag params #765

merged 1 commit into from
Mar 4, 2025

Conversation

ch-liuzhide
Copy link
Contributor

No description provided.

Copy link

vercel bot commented Mar 4, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
petercat 🔄 Building (Inspect) Visit Preview 💬 Add feedback Mar 4, 2025 10:41am

Copy link

Walkthrough

This pull request refactors the RAG (Retrieval-Augmented Generation) parameters by adjusting the similarity threshold and modifying the chunk size and overlap in the configuration. These changes aim to optimize the retrieval and processing of data.

Changes

File Summary
server/agent/tools/knowledge.py Reduced the similarity threshold from 0.65 to 0.6.
server/rag/router.py Increased chunk size from 500 to 1000 and chunk overlap from 100 to 200.

@@ -32,7 +32,7 @@ async def search_knowledge(
space_id_list=[bot_id, repo_name],
question=query,
embedding_model_name=EmbeddingModelEnum.OPENAI,
similarity_threshold=0.65,
similarity_threshold=0.6,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowering the similarity threshold from 0.65 to 0.6 may increase the number of results returned, potentially including less relevant matches. Ensure this change aligns with the intended retrieval quality.

@ch-liuzhide ch-liuzhide merged commit dd2d8cb into main Mar 4, 2025
3 of 4 checks passed
@@ -60,8 +60,8 @@ async def reload_repo(
repo_name=request.repo_name, auth_token=user.access_token
),
split_config=KnowledgeSplitConfig(
chunk_size=500,
chunk_overlap=100,
chunk_size=1000,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing the chunk size and overlap may affect memory usage and processing time. Ensure that the system can handle these changes without performance degradation.

@ch-liuzhide ch-liuzhide deleted the whisker branch March 4, 2025 10:41
Copy link

codecov bot commented Mar 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
server/agent/tools/knowledge.py 92.00% <ø> (ø)
server/rag/router.py 43.85% <ø> (ø)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant