-
Notifications
You must be signed in to change notification settings - Fork 13
ECHO-167 Delete conversation from lightrag #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…delete conversation functionality in stateless API - Upgraded lightrag-dembrane dependency in pyproject.toml and lock files for improved features. - Added a new endpoint to delete conversations by their IDs, including validation for UUID format and enhanced error handling. - Introduced utility functions for deleting transcripts and segments from Directus, improving data management capabilities.
""" WalkthroughThis change introduces a new API endpoint for deleting conversations by their IDs in the stateless API router. It adds a Pydantic request model for input validation, and implements logic to validate UUIDs, fetch related document IDs, and orchestrate the deletion of documents from the RAG index, transcripts from the PostgreSQL database, and segments from Directus. Supporting utility functions for transcript and segment deletion are added. Additionally, a DELETE endpoint for single conversation deletion is added in the conversation API, which calls the stateless deletion logic. The lightrag-dembrane dependency is bumped to version 1.2.7.7. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as StatelessRouter
participant RAG
participant DB as PostgreSQLDB
participant Directus
Client->>API: POST /rag/delete_conversation (conversation_ids)
API->>API: Validate UUIDs
API->>RAG: Ensure initialized, get instance
API->>DB: Get doc_ids for conversation_ids
loop For each doc_id
API->>RAG: adelete_by_doc_id
API->>DB: delete_transcript_by_doc_id
API->>Directus: delete_segment_from_directus
end
API-->>Client: 200 OK
sequenceDiagram
participant Client
participant API as ConversationRouter
participant StatelessAPI as StatelessRouter
participant Directus
Client->>API: DELETE /conversation/{conversation_id}
API->>API: Validate existence & authorization
API->>StatelessAPI: POST /rag/delete_conversation (single conversation_id)
StatelessAPI->>RAG: Delete documents & transcripts & segments
API->>Directus: Delete conversation item
API-->>Client: 200 OK
Assessment against linked issues
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
echo/server/dembrane/api/stateless.py (1)
241-268
:⚠️ Potential issueMissing transaction safety and error handling
The deletion process isn't atomic - if a step fails midway, you'll have partially deleted data. Also, synchronous Directus call in async function is an anti-pattern.
- Use transaction context
- Add comprehensive error handling
- Make Directus call async
- Return success response
@StatelessRouter.post("/rag/delete_conversation") async def delete_conversation(payload: DeleteConversationRequest, session: DependencyDirectusSession #Needed for fake auth - ) -> None: + ) -> dict[str, str]: session = session - conversation_ids = payload.conversation_id + conversation_ids = payload.conversation_ids for id in conversation_ids: if not is_valid_uuid(id): raise HTTPException(status_code=400, detail="Invalid conversation ID") # Initialize RAG if not RAGManager.is_initialized(): await RAGManager.initialize() rag = get_rag() await initialize_pipeline_status() postgres_db = await PostgresDBManager.get_initialized_db() try: lightrag_doc_ids = await get_segment_from_conversation_ids(postgres_db, conversation_ids) except Exception as e: logger.exception("Failed to get segment from conversation ids. Check PGSQ") raise HTTPException(status_code=500, detail=str(e)) from e + deletion_errors = [] for doc_id in lightrag_doc_ids: + try: await rag.adelete_by_doc_id(str(doc_id)) await delete_transcript_by_doc_id(postgres_db, str(doc_id)) - delete_segment_from_directus(str(doc_id)) + await delete_segment_from_directus(str(doc_id)) + except Exception as e: + logger.error(f"Failed to delete document {doc_id}: {e}") + deletion_errors.append(str(doc_id)) + + if deletion_errors: + logger.warning(f"Failed to delete {len(deletion_errors)} documents: {deletion_errors}") + logger.info(f"Deleted {len(lightrag_doc_ids)} document(s) from RAG") + return {"status": "success", "message": f"Deleted {len(lightrag_doc_ids) - len(deletion_errors)} documents"}
🧹 Nitpick comments (1)
echo/server/dembrane/api/stateless.py (1)
238-240
: Clean model definitionGood use of Pydantic for request validation, but suggest making the field name plural to match its type.
-class DeleteConversationRequest(BaseModel): - conversation_id: list[str] +class DeleteConversationRequest(BaseModel): + conversation_ids: list[str]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
echo/server/requirements-dev.lock
is excluded by!**/*.lock
echo/server/requirements.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
echo/server/dembrane/api/stateless.py
(2 hunks)echo/server/dembrane/audio_lightrag/utils/lightrag_utils.py
(2 hunks)echo/server/pyproject.toml
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-check-server
🔇 Additional comments (1)
echo/server/pyproject.toml (1)
50-50
: Neat dependency bump!Updating lightrag-dembrane to 1.2.7.7 - solid version bump to support the new conversation deletion functionality.
"DELETE_TRANSCRIPT_BY_DOC_ID": | ||
""" | ||
DELETE FROM LIGHTRAG_VDB_TRANSCRIPT | ||
WHERE document_id = '{doc_id}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix SQL Injection vector in template
Need to update SQL template to use parameterized queries instead of string formatting.
- "DELETE FROM LIGHTRAG_VDB_TRANSCRIPT
- WHERE document_id = '{doc_id}'
+ "DELETE FROM LIGHTRAG_VDB_TRANSCRIPT
+ WHERE document_id = $1
Committable suggestion skipped: line range outside the PR's diff.
async def delete_transcript_by_doc_id(db: PostgreSQLDB, doc_id: str) -> None: | ||
sql = SQL_TEMPLATES["DELETE_TRANSCRIPT_BY_DOC_ID"].format(doc_id=doc_id) | ||
await db.execute(sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL Injection vulnerability alert!
The SQL template uses string formatting with doc_id directly, which is a potential SQL injection vector. This is a major security concern.
Refactor to use parameterized queries:
-async def delete_transcript_by_doc_id(db: PostgreSQLDB, doc_id: str) -> None:
- sql = SQL_TEMPLATES["DELETE_TRANSCRIPT_BY_DOC_ID"].format(doc_id=doc_id)
- await db.execute(sql)
+async def delete_transcript_by_doc_id(db: PostgreSQLDB, doc_id: str) -> None:
+ sql = SQL_TEMPLATES["DELETE_TRANSCRIPT_BY_DOC_ID"]
+ await db.execute(sql, data={"doc_id": doc_id})
Also update the SQL template:
- "DELETE FROM LIGHTRAG_VDB_TRANSCRIPT
- WHERE document_id = '{doc_id}'
+ "DELETE FROM LIGHTRAG_VDB_TRANSCRIPT
+ WHERE document_id = $1
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goes thru validation before. No risk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArindamRoy23 Got it! If the doc_id
is properly validated before reaching this function, that definitely reduces the risk.
Just a pro tip: parameterized queries are still considered a 100x dev best practice even with validation because they handle escaping automatically and make your code more maintainable. But LGTM if you're confident in your validation pipeline!
def delete_segment_from_directus(segment_id: str) -> None: | ||
directus.delete_item("conversation_segment", segment_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Missing error handling and async pattern
This function should be async for consistency with delete_transcript_by_doc_id and include error handling. Function should also have docstring because 10x engineers document their code.
-def delete_segment_from_directus(segment_id: str) -> None:
- directus.delete_item("conversation_segment", segment_id)
+async def delete_segment_from_directus(segment_id: str) -> None:
+ """
+ Asynchronously delete a conversation segment from Directus.
+
+ Args:
+ segment_id: The ID of the segment to delete
+
+ Raises:
+ Exception: If deletion fails
+ """
+ try:
+ directus.delete_item("conversation_segment", segment_id)
+ except Exception as e:
+ logger.error(f"Failed to delete segment {segment_id} from Directus: {e}")
+ raise
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def delete_segment_from_directus(segment_id: str) -> None: | |
directus.delete_item("conversation_segment", segment_id) | |
async def delete_segment_from_directus(segment_id: str) -> None: | |
""" | |
Asynchronously delete a conversation segment from Directus. | |
Args: | |
segment_id: The ID of the segment to delete | |
Raises: | |
Exception: If deletion fails | |
""" | |
try: | |
directus.delete_item("conversation_segment", segment_id) | |
except Exception as e: | |
logger.error(f"Failed to delete segment {segment_id} from Directus: {e}") | |
raise |
@ArindamRoy23 pls link the pr with linear |
ECHO-166 |
@ArindamRoy23 u need to it to the add pr title. and why is it not in /conversation like we discussed |
@spashii, No changes made so far. Will do and let you know |
- Added a new endpoint to delete conversations by ID, enhancing data management capabilities. - Updated the DeleteConversationRequest model to accept a list of conversation IDs for improved flexibility. - Enhanced error handling to ensure proper validation and authorization checks before deletion. - Refactored imports in conversation.py for better organization and clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
echo/server/dembrane/api/conversation.py (1)
578-578
: Return a standardized response format.Consider returning a standardized response format like other endpoints instead of directly returning the Directus deletion result.
- return(directus.delete_item("conversation", conversation_id)) + directus.delete_item("conversation", conversation_id) + return { + "status": "success", + "message": "Conversation deleted successfully", + "conversation_id": conversation_id + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
echo/server/dembrane/api/conversation.py
(2 hunks)echo/server/dembrane/api/stateless.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- echo/server/dembrane/api/stateless.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
echo/server/dembrane/api/conversation.py (1)
echo/server/dembrane/api/stateless.py (3)
DeleteConversationRequest
(238-239)generate_summary
(31-62)delete_conversation
(242-267)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-check-server
🔇 Additional comments (2)
echo/server/dembrane/api/conversation.py (2)
25-29
: Updated imports are clean and focused.Nice job importing just what you need from the stateless module. Clean dependency management ftw.
573-576
: Solid implementation using the stateless module.Well-structured approach to reuse the stateless delete conversation logic. Composition over reimplementation.
- Updated the delete_conversation endpoint to include detailed error handling and logging for deletion failures. - Added a docstring to clarify the function's purpose and parameters. - Ensured proper execution of RAG and Directus deletions with exception management to provide meaningful feedback on errors.
@ussaama can you please also add the frontend changes here so it is all in one commit and linked? |
Update lightrag-dembrane dependency to version 1.2.7.7 and implement delete conversation functionality in stateless API
Linked Issue
ECHO-166
Summary by CodeRabbit
Summary by CodeRabbit