feat: adding count with filtering operations to `OpenSearchDocumentStore` #2653

davidsbatista · 2026-01-05T17:55:41Z

Related Issues

fixes add the following operations to OpenSearchDocumentStore #2635

Proposed Changes:

count_documents_by_filter() - count documents matching filter criteria
count_distinct_values_by_filter()- get distinct value counts for metadata fields with optional filtering
get_fields_info() - retrieve field type information from index mapping
get_field_min_max() - get min/max values for numeric metadata fields
get_field_unique_values() - get unique values for a field with pagination and content-based filtering
query_sql() - execute SQL queries against OpenSearch with support for multiple response formats (JSON, CSV, JDBC, RAW)

How did you test it?

added integrations tests covering the new methods both or sync and async versions

Notes for the reviewer

added httpx>=0.28.1 dependency
the query_sql() method performs a raw http request (based on httpx) if the specified response format is not JSON

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

…ntStore

sjrl · 2026-01-07T08:36:38Z

Hey @tstadel I'd also appreciate your review on this since we want to make sure it will in platform as well.

…ntStore

integrations/opensearch/pyproject.toml

sjrl · 2026-01-08T08:35:48Z

integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py

+        # Fields that are not metadata (should stay at top level)
+        non_meta_fields = {"id", "content", "embedding", "blob", "sparse_embedding", "score"}
+
        for hit in hits:
-            data = hit["_source"]
+            data = hit["_source"].copy()
+
+            # Reconstruct metadata dict from flattened fields
+            meta = {}
+            fields_to_remove = []
+            for key, value in data.items():
+                if key not in non_meta_fields:
+                    meta[key] = value
+                    fields_to_remove.append(key)
+
+            # Remove metadata fields from top level and add them to meta
+            for key in fields_to_remove:
+                data.pop(key, None)
+
+            if meta:
+                data["meta"] = meta
+
            if "highlight" in hit:
-                data["metadata"]["highlighted"] = hit["highlight"]
+                if "meta" not in data:
+                    data["meta"] = {}
+                data["meta"]["highlighted"] = hit["highlight"]


Could you explain what was happening before making these changes? Before this were we throwing away all meta information when reconstructing the Document?

Also could we add some integration tests in test_bm25_retriever.py and test_embedding_retriever.py to do a full check of all fields of a returned Document? It seems we are missing some tests to confirm that returned Docs are reconstructed properly.

Also it seems like there is another function called _deserialize_document that contained the same logic here but doesn't seem to be used anywhere. Could we remove it?

This is not needed and was over-engineered.

I've added extensive tests to ensure that both BM25 and Embedding retrievers can store and retrieve documents with "complex" metadata. It's working with and without these changes. I will revert it.

Thanks for spotting this!

sjrl · 2026-01-08T08:44:31Z

integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py

+        """
+        Builds cardinality aggregations for all metadata fields in the index mapping.
+        """
+        special_fields = {"content", "embedding", "id", "score", "blob", "sparse_embedding"}


Seems like this set of fields is reused a few times. Perhaps we could make it a global variable at the top of this file (or a class attribute) so we can have one source of truth?

sjrl · 2026-01-08T08:46:02Z

integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py

+    @staticmethod
+    def _build_cardinality_aggregations(index_mapping: dict[str, Any]) -> dict[str, Any]:
+        """
+        Builds cardinality aggregations for all metadata fields in the index mapping.


I think it could be helpful to link to the OpenSearch docs on cardinality aggregations https://docs.opensearch.org/latest/aggregations/metric/cardinality/ in the docstring

…ntStore

…ten and retrieved

…ntStore

davidsbatista added 9 commits January 5, 2026 15:17

fixed metadata merging to properly update the meta key

172f897

formmatting

842da6a

adding count distinct metadata values

a28bb2a

refactoring to reduce duplicated code

b0b594c

adding get metadata info

b23274f

adding get_field_max_min

22e160d

fixing get_field_max_min

310846d

adding get_field_unique_values

e0be21f

adding get_field_unique_values async

e6932b0

github-actions bot added integration:opensearch type:documentation Improvements or additions to documentation labels Jan 5, 2026

davidsbatista changed the title ~~Feat/add count filtering to open search document store~~ feat: adding count with filtering operations to open search document store Jan 5, 2026

davidsbatista changed the title ~~feat: adding count with filtering operations to open search document store~~ feat: adding count with filtering operations to OpenSearchDocumentStore Jan 5, 2026

davidsbatista added 7 commits January 5, 2026 17:58

Merge branch 'main' into feat/add-count-filtering-to-OpenSearchDocume…

511c421

…ntStore

formmatting

5e7cd90

updating tests

0c0f31c

formmatting

2010261

cleaning up

873a4dc

adding httpx as a dependency

1f3347b

fixing pyproject.toml

3622168

davidsbatista marked this pull request as ready for review January 6, 2026 11:16

davidsbatista requested a review from a team as a code owner January 6, 2026 11:16

davidsbatista requested review from sjrl and removed request for a team January 6, 2026 11:16

sjrl requested a review from tstadel January 7, 2026 08:36

davidsbatista added 2 commits January 7, 2026 15:17

Merge branch 'main' into feat/add-count-filtering-to-OpenSearchDocume…

b3e99bd

…ntStore

updating tests: making use of the new refresh feature

d96cc4c

sjrl reviewed Jan 8, 2026

View reviewed changes

integrations/opensearch/pyproject.toml Show resolved Hide resolved

sjrl reviewed Jan 8, 2026

View reviewed changes

davidsbatista added 6 commits January 8, 2026 10:15

Merge branch 'main' into feat/add-count-filtering-to-OpenSearchDocume…

11b6d88

…ntStore

dealing with special fields

69863d0

docstring update

3a3df4c

Merge branch 'main' into feat/add-count-filtering-to-OpenSearchDocume…

98ddcf3

…ntStore

adding roundtrip tests to assert documents metadata is correctly writ…

6b2081b

…ten and retrieved

Merge branch 'main' into feat/add-count-filtering-to-OpenSearchDocume…

4f8ab78

…ntStore

davidsbatista requested a review from sjrl January 9, 2026 11:21

davidsbatista added 2 commits January 9, 2026 11:21

Merge branch 'main' into feat/add-count-filtering-to-OpenSearchDocume…

f400fd8

…ntStore

Merge branch 'main' into feat/add-count-filtering-to-OpenSearchDocume…

535897f

…ntStore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: adding count with filtering operations to `OpenSearchDocumentStore` #2653

feat: adding count with filtering operations to `OpenSearchDocumentStore` #2653

davidsbatista commented Jan 5, 2026 •

edited by sjrl

Loading

Uh oh!

sjrl commented Jan 7, 2026

Uh oh!

Uh oh!

sjrl Jan 8, 2026

Uh oh!

sjrl Jan 8, 2026

Uh oh!

sjrl Jan 8, 2026

Uh oh!

davidsbatista Jan 8, 2026

Uh oh!

sjrl Jan 8, 2026

Uh oh!

sjrl Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: adding count with filtering operations to OpenSearchDocumentStore #2653

Are you sure you want to change the base?

feat: adding count with filtering operations to OpenSearchDocumentStore #2653

Conversation

davidsbatista commented Jan 5, 2026 • edited by sjrl Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

sjrl commented Jan 7, 2026

Uh oh!

Uh oh!

sjrl Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sjrl Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sjrl Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

davidsbatista Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sjrl Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sjrl Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: adding count with filtering operations to `OpenSearchDocumentStore` #2653

feat: adding count with filtering operations to `OpenSearchDocumentStore` #2653

davidsbatista commented Jan 5, 2026 •

edited by sjrl

Loading