Feature/valkey document store #2589

daric93 · 2025-12-05T19:54:13Z

Related Issues

Valkey Document Store
#2574

Proposed Changes:

Implemented ValkeyDocumentStore as a built-in integration for Haystack. This component follows Haystack’s DocumentStore interface using Valkey’s search module for vector similarity and metadata-based filtering.

Valkey supports all essential capabilities required by the DocumentStore interface.
How did you test it?

unit tests
integration tests
manual test (running example with haystack integration)

Checklist

I have read the contributors guidelines and the code of conduct
I have created the related issue
I added tests and updated the docstrings

CLAassistant · 2025-12-05T19:56:15Z

All committers have signed the CLA.

sjrl · 2025-12-19T11:41:10Z

integrations/valkey/LICENSE.txt

+
+To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!)  The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.
+
+Copyright [yyyy] [name of copyright owner]


Suggested change

Copyright [yyyy] [name of copyright owner]

Copyright 2025 deepset GmbH

sjrl · 2025-12-19T11:43:41Z

integrations/valkey/README.md

@@ -0,0 +1,109 @@
+# valkey-haystack


We keep our Readmes in this repo bare-bones (e.g. look at the opensearch example) and the content you have here will go into in our haystack-integrations repo (e.g. look at the opensearch integration page).

sjrl · 2025-12-19T11:46:39Z

@daric93 thanks for the contribution!

Some feedback:

Update the labeler.yml file to include integration:valkey
Remove the file integrations/__init__.py which is here. This is purposely omitted since this messes up how namespacing works for our core integration packages.

I'll continue with a more in-depth review, but we are quite busy so this might take some time.

sjrl · 2025-12-19T11:57:57Z

.github/workflows/valkey.yml

+        os: [ubuntu-latest] # Valkey service container only available on Linux
+        python-version: ["3.9", "3.13"]


We are in the process of deprecating python 3.9 support. So following this PR #2616 could you update the python-version being tested here to 3.10?

sjrl · 2025-12-19T11:59:04Z

integrations/valkey/pyproject.toml

+dynamic = ["version"]
+description = ''
+readme = "README.md"
+requires-python = ">=3.9,<3.13"


I noticed i the github workflow we test on python 3.13 but here you are saying valkey doesn't support 3.13. If it does we should update this to requires-python = ">=3.9"

sjrl · 2025-12-19T11:59:20Z

integrations/valkey/pyproject.toml

+requires-python = ">=3.9,<3.13"
+license = "Apache-2.0"
+keywords = []
+authors = [{ name = "John Doe", email = "[email protected]" }]


Suggested change

authors = [{ name = "John Doe", email = "[email protected]" }]

authors = [{ name = "deepset", email = "[email protected]" }]

sjrl · 2025-12-19T12:03:03Z

integrations/valkey/pyproject.toml

+test = [
+    "pytest>=7.4",
+    "pytest-asyncio>=0.23",
+    "pytest-cov",
+    "pytest-rerunfailures",
+    "mypy",
+]


This isn't needed since it's defined under [tool.hatch.envs.test], so let's remove

sjrl · 2025-12-19T12:03:57Z

integrations/valkey/pyproject.toml

+examples = [
+    "markdown-it-py",
+    "mdit_plain",
+    "haystack-ai>=2.11.0",
+    "numpy<2.0.0",
+    "torch==2.2.2; sys_platform == 'darwin' and platform_machine == 'x86_64'",
+    "torch>=2.0.0; sys_platform != 'darwin' or platform_machine != 'x86_64'",
+    "sentence-transformers>=5.0.0",
+]


We don't add dependencies for examples in the pyproject.toml. So let's remove it from here, you can make a comment in the example files if there are extra deps to be installed

sjrl · 2025-12-19T12:08:09Z

...rations/valkey/src/haystack_integrations/components/retrievers/valkey/embedding_retriever.py

+        # Pipelines serialized with old versions of the component might not
+        # have the filter_policy field.
+        if filter_policy := data["init_parameters"].get("filter_policy"):
+            data["init_parameters"]["filter_policy"] = FilterPolicy.from_str(filter_policy)


This dev comment copied from an existing integration. Probably fine to leave the if check but I'd drop the dev comment since an old version of this component doesn't exist yet.

sjrl · 2025-12-19T12:15:23Z