-
Notifications
You must be signed in to change notification settings - Fork 204
feat!: expose refresh parameter in OpenSearchDocumentStore
#2623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: expose refresh parameter in OpenSearchDocumentStore
#2623
Conversation
Add configurable refresh parameter to write, delete, and update methods in OpenSearchDocumentStore. This allows users to control when index changes become visible to search operations, enabling read-your-writes consistency without relying on time.sleep() workarounds. Methods updated: - write_documents / write_documents_async - delete_documents / delete_documents_async - delete_all_documents / delete_all_documents_async - delete_by_filter / delete_by_filter_async - update_by_filter / update_by_filter_async The refresh parameter accepts: - True: Force immediate refresh - False: No refresh (best for bulk performance) - 'wait_for': Wait for next refresh cycle (default) Additional fix: - Fixed delete_all_documents_async to set wait_for_completion=True, ensuring the async delete operation completes before returning Closes deepset-ai#2065
|
Thanks for opening a separate PR. Let's put this on hold and first agree on #2622 |
|
@GunaPalanivel when you have time, feel free to adapt this PR based on #2622 🙂 |
|
I'm working on it 🙂 |
…ings to match ES pattern (deepset-ai#2623)
|
@anakin87 Based on #2622 , I've adapted this PR for OpenSearch: Main changes:
Let me know if I missed anything or got something wrong, happy to fix! 🙂 |
|
Hi @anakin87, just a gentle follow-up on this PR when you get a moment. All checks are passing, and I’m happy to make any changes if needed. A review would really help me move forward and pick up the next issues. |
|
@GunaPalanivel coordinate with @davidsbatista as he's doing work on OpenSearchDocumentStore, there is a merge conflict most likely due to that |
|
@GunaPalanivel, thanks for the contribution! Please sync with the latest main and fix the conflicts; they should be easy. In any case, let me know if you need help. |
|
I'll review it soon (once conflicts are fixed). |
integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py
Outdated
Show resolved
Hide resolved
- Sync calls: always wait_for_completion=True (to get result dict with deleted count) - Async calls: wait_for_completion=True only if refresh=True - Logic: wait_for_completion = (not is_async) or refresh
OpenSearchDocumentStore
Align with ElasticSearch behavior as suggested by @davidsbatista: - Set wait_for_completion = not is_async (True only for sync calls) - This ensures sync calls get the result dict for logging deleted count - Async calls do not wait, improving performance - Centralizes simple logic in _prepare_delete_all_request
Both sync and async delete_all_documents need to wait for completion so documents are actually deleted before the function returns
The is_async parameter is no longer needed since wait_for_completion is always True for delete_all_documents operations
anakin87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, aligned with Elasticsearch.
@davidsbatista could you take a final look?
davidsbatista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks again @GunaPalanivel for one more contribution! We appreciated it!
|
@davidsbatista in case you merge and release a new version, let's use 5.0.0 since this is technically a breaking change. |
OpenSearchDocumentStoreOpenSearchDocumentStore
Related Issues
Part of #2605
Description
This PR exposes the
refreshparameter to relevant methods inOpenSearchDocumentStore, allowing users to control when index changes become visible to search operations.Motivation
OpenSearch is a "near real-time" search engine where changes are not immediately visible after write/update/delete operations. Previously, users had to use
time.sleep()workarounds to ensure consistency. This change exposes the underlyingrefreshparameter, giving users explicit control over this behavior.Changes
New
RefreshTypeparameter added to the following methods:write_documents/write_documents_async"wait_for"delete_documents/delete_documents_async"wait_for"delete_all_documents/delete_all_documents_asyncTruedelete_by_filter/delete_by_filter_async"wait_for"update_by_filter/update_by_filter_async"wait_for"Parameter values:
True: Force refresh immediately after the operationFalse: Do not refresh (better performance for bulk operations)"wait_for": Wait for the next refresh cycle (default, ensures read-your-writes consistency)Test Updates
Updated integration tests to use
refresh=Trueinstead oftime.sleep(), making tests more reliable and faster.Additional Fixes
delete_all_documents_asyncto setwait_for_completion=True, ensuring the async delete operation completes before returningHow was this tested?
refreshparameter