-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: astra db chunks deletion based on metadata field #5537
base: main
Are you sure you want to change the base?
feat: astra db chunks deletion based on metadata field #5537
Conversation
… document management - Introduced a new 'deletion_field' input to specify a metadata field for deleting documents before loading new data. - Enhanced the _add_documents_to_vector_store method to handle document deletion based on the specified field, improving data management capabilities.
CodSpeed Performance ReportMerging #5537 will degrade performances by 62.5%Comparing Summary
Benchmarks breakdown
|
…ove readability. - Optimized the deletion logic by using a set comprehension to eliminate duplicates when gathering delete values from documents.
@@ -607,6 +616,18 @@ def _add_documents_to_vector_store(self, vector_store) -> None: | |||
msg = "Vector Store Inputs must be Data objects." | |||
raise TypeError(msg) | |||
|
|||
if documents and self.deletion_field: | |||
self.log(f"Deleting documents where {self.deletion_field}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we remove this log line?
self.log(f"Deleting documents where {self.deletion_field} matches {delete_values}.") | ||
collection.delete_many({f"metadata.{self.deletion_field}": {"$in": delete_values}}) | ||
except Exception as e: | ||
msg = f"Error deleting documents from AstraDBVectorStore: {e}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg = f"Error deleting documents from AstraDBVectorStore: {e}" | |
msg = f"Error deleting documents from AstraDBVectorStore based on '{self.deletion_field}': {e}" |
Purpose
This PR addresses the need to reload specific documents without affecting others. To achieve this, a new option, "deletion_field", has been introduced.
Functionality
When "deletion_field" is set (e.g., "file_path"), the system will delete all documents in the target collection where metadata["file_path"] matches the corresponding value in the incoming documents.
This ensures that chunks from the specific file are removed before reloading it, preventing duplicates or conflicts.