Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ Choose the deployment mode that fits your needs.
pip install sochdb
```

Optional framework integration:

```bash
pip install "sochdb[crewai]"
```

Or from source:
```bash
cd sochdb-python-sdk
Expand All @@ -24,6 +30,45 @@ pip install -e .
## Architecture: Flexible Deployment

```

### CrewAI Integration

The Python SDK includes an optional CrewAI integration layer for SochDB-backed
knowledge search and memory writes.

Available helpers:

- `SochDBKnowledgeStore`
- `create_crewai_tools(...)`
- `SochDBKnowledgeStore.from_collection(...)` for embedded mode
- `SochDBKnowledgeStore.from_client(...)` for gRPC / hosted mode

Example:

```python
from sochdb import Database, Namespace, SochDBKnowledgeStore, create_crewai_tools

def embed(texts):
...

db = Database.open("./crewai_demo")
ns = Namespace(db, "crew")
collection = ns.create_collection("knowledge", dimension=384)

store = SochDBKnowledgeStore.from_collection(collection, embedder=embed)
store.add_texts(
["SochDB supports embedded and gRPC modes."],
metadatas=[{"topic": "architecture"}],
ids=["arch-1"],
)

search_tool, remember_tool = create_crewai_tools(store, top_k=3)
```

See `examples/28_crewai_knowledge_tools.py` for a complete example.
See `examples/29_crewai_remote_tools.py` for the hosted/gRPC variant.
The remote example also supports `SOCHDB_CREWAI_SKIP_KICKOFF=1` to smoke-test
remote storage and retrieval without LLM credentials.
┌─────────────────────────────────────────────────────────────┐
│ DEPLOYMENT OPTIONS │
├─────────────────────────────────────────────────────────────┤
Expand Down
104 changes: 104 additions & 0 deletions examples/28_crewai_knowledge_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/usr/bin/env python3
"""
CrewAI + SochDB knowledge tool example.

This example shows the supported integration shape in the Python SDK:

- SochDB stores searchable project knowledge
- a user-supplied embedder converts text into vectors
- CrewAI agents use SochDB-backed search and memory tools

Install:
pip install -e ".[crewai]"

Optional environment:
OPENAI_API_KEY=<your key>
CREWAI_MODEL=gpt-4o-mini
"""

from __future__ import annotations

import hashlib
import math
import os
import tempfile
from typing import Sequence

from sochdb import Database, Namespace, SochDBKnowledgeStore, create_crewai_tools


def deterministic_embed(texts: Sequence[str], dim: int = 32) -> list[list[float]]:
"""
Tiny local embedder for demos and tests.

This is not semantically strong like OpenAI or sentence-transformers, but it
keeps the example runnable without another service dependency.
"""

vectors: list[list[float]] = []
for text in texts:
digest = hashlib.sha256(text.encode("utf-8")).digest()
values = [((digest[i % len(digest)] / 255.0) * 2.0) - 1.0 for i in range(dim)]
norm = math.sqrt(sum(v * v for v in values)) or 1.0
vectors.append([v / norm for v in values])
return vectors


def build_knowledge_store() -> SochDBKnowledgeStore:
tempdir = tempfile.mkdtemp(prefix="sochdb-crewai-")
db = Database.open(tempdir)
namespace = Namespace(db, "crewai_demo")
collection = namespace.create_collection("knowledge", dimension=32)

store = SochDBKnowledgeStore.from_collection(collection, embedder=deterministic_embed)
store.add_texts(
[
"SochDB supports both embedded and gRPC deployment modes.",
"The hosted SochDB demo endpoint listens on studio.agentslab.host:50053.",
"The corrected 10GB benchmark showed about 506 QPS after one-time index load.",
],
metadatas=[
{"topic": "architecture"},
{"topic": "deployment"},
{"topic": "benchmark"},
],
ids=["arch-1", "deploy-1", "bench-1"],
)
return store


def main() -> None:
from crewai import Agent, Crew, Task

store = build_knowledge_store()
search_tool, remember_tool = create_crewai_tools(store, top_k=3)

model = os.environ.get("CREWAI_MODEL", "gpt-4o-mini")

researcher = Agent(
role="SochDB Researcher",
goal="Answer questions using the SochDB knowledge base",
backstory="You ground answers in the project knowledge store before responding.",
llm=model,
tools=[search_tool, remember_tool],
verbose=True,
)

task = Task(
description=(
"Find the current 10GB benchmark takeaway and summarize it in 2-3 sentences. "
"Use the SochDB tools instead of guessing."
),
expected_output="A short grounded summary of the latest 10GB benchmark result.",
agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task], verbose=True)
result = crew.kickoff()

print("\n=== Crew Result ===\n")
print(result)


if __name__ == "__main__":
main()
135 changes: 135 additions & 0 deletions examples/29_crewai_remote_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
#!/usr/bin/env python3
"""
CrewAI + SochDB hosted remote tool example.

This example shows the same CrewAI integration surface as the embedded example,
but points it at a remote SochDB collection over gRPC.

Environment variables:
SOCHDB_GRPC_ADDRESS default: studio.agentslab.host:50053
SOCHDB_NAMESPACE default: default
CREWAI_MODEL default: gpt-4o-mini
OPENAI_API_KEY required by CrewAI for the default LLM provider
SOCHDB_CREWAI_SKIP_KICKOFF=1 to only validate remote storage/search setup

Install:
pip install -e ".[crewai]"
"""

from __future__ import annotations

import hashlib
import math
import os
import time
from pathlib import Path
from typing import Sequence

import sys

sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "src"))

from sochdb import SochDBClient, SochDBKnowledgeStore, create_crewai_tools


DEFAULT_GRPC_ADDRESS = "studio.agentslab.host:50053"
DEFAULT_NAMESPACE = "default"


def deterministic_embed(texts: Sequence[str], dim: int = 32) -> list[list[float]]:
"""Small local embedder so the demo does not require a second model service."""

vectors: list[list[float]] = []
for text in texts:
digest = hashlib.sha256(text.encode("utf-8")).digest()
values = [((digest[i % len(digest)] / 255.0) * 2.0) - 1.0 for i in range(dim)]
norm = math.sqrt(sum(v * v for v in values)) or 1.0
vectors.append([v / norm for v in values])
return vectors


def build_remote_store(client: SochDBClient, namespace: str) -> tuple[str, SochDBKnowledgeStore]:
run_id = f"crewai-remote-{int(time.time())}"
collection_name = f"sdk_crewai_remote_{run_id}"
client.create_collection(collection_name, dimension=32, namespace=namespace, metric="cosine")

store = SochDBKnowledgeStore.from_client(
client,
collection_name=collection_name,
namespace=namespace,
embedder=deterministic_embed,
)
store.add_texts(
[
"The hosted SochDB demo endpoint listens on studio.agentslab.host:50053.",
"The corrected 10GB benchmark showed about 506 QPS after one-time index load.",
"BAAI/bge-base-en-v1.5 is the best published SciFact quality result so far.",
],
metadatas=[
{"topic": "deployment"},
{"topic": "benchmark"},
{"topic": "quality"},
],
ids=[f"{run_id}-deploy", f"{run_id}-bench", f"{run_id}-quality"],
)
return collection_name, store


def main() -> None:
grpc_address = os.environ.get("SOCHDB_GRPC_ADDRESS", DEFAULT_GRPC_ADDRESS)
namespace = os.environ.get("SOCHDB_NAMESPACE", DEFAULT_NAMESPACE)
model = os.environ.get("CREWAI_MODEL", "gpt-4o-mini")
skip_kickoff = os.environ.get("SOCHDB_CREWAI_SKIP_KICKOFF", "").lower() in {
"1",
"true",
"yes",
}

client = SochDBClient(grpc_address)
collection_name, store = build_remote_store(client, namespace)
print(f"Using remote collection: {collection_name} in namespace={namespace}")
try:
if skip_kickoff:
hits = store.search("What is the 10GB benchmark takeaway?", top_k=2)
print("\n=== Remote Store Smoke ===\n")
print(store.format_hits(hits))
return

if not os.environ.get("OPENAI_API_KEY"):
raise SystemExit(
"OPENAI_API_KEY is required for the CrewAI kickoff. "
"Set SOCHDB_CREWAI_SKIP_KICKOFF=1 to validate the remote SochDB path without an LLM."
)

from crewai import Agent, Crew, Task

search_tool, remember_tool = create_crewai_tools(store, top_k=3)

researcher = Agent(
role="SochDB Remote Researcher",
goal="Answer questions using the hosted SochDB knowledge base.",
backstory="You always search the remote collection before making a claim.",
llm=model,
tools=[search_tool, remember_tool],
verbose=True,
)

task = Task(
description=(
"Find the current 10GB benchmark takeaway and summarize it in 2-3 sentences. "
"Use the SochDB tools and mention that the knowledge came from the remote store."
),
expected_output="A short grounded summary of the latest 10GB benchmark result.",
agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task], verbose=True)
result = crew.kickoff()
print("\n=== Crew Result ===\n")
print(result)
finally:
client.close()


if __name__ == "__main__":
main()
37 changes: 37 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ cargo build --release
| `08_ipc_client.py` | IPC | Multi-process access via IPC |
| `26_hosted_studio_ingest.py` | gRPC + Studio | Remote write plus hosted Studio event ingestion |
| `27_hosted_remote_smoke.py` | gRPC | Minimal hosted remote smoke test for SDK parity |
| `28_crewai_knowledge_tools.py` | Embedded + CrewAI | CrewAI tools backed by SochDB knowledge search |
| `29_crewai_remote_tools.py` | gRPC + CrewAI | CrewAI tools backed by a remote SochDB collection |

## Running Examples

Expand Down Expand Up @@ -66,6 +68,39 @@ There is also a matching manual GitHub Actions workflow at
`.github/workflows/hosted-smoke.yml` for running the same hosted smoke path on
demand.

Latest hosted validation:

- GitHub-hosted workflow passed on May 5, 2026:
`https://github.com/SaiSandeepKantareddy/sochdb-python-sdk/actions/runs/25357489415`

### CrewAI Knowledge Tools (28)

Runs a CrewAI agent with SochDB-backed `search` and `remember` tools:

```bash
pip install -e ".[crewai]"
OPENAI_API_KEY=... python examples/28_crewai_knowledge_tools.py
```

### CrewAI Remote Knowledge Tools (29)

Runs the same CrewAI tool surface against a hosted SochDB collection over gRPC:

```bash
pip install -e ".[crewai]"
OPENAI_API_KEY=... \
SOCHDB_GRPC_ADDRESS=studio.agentslab.host:50053 \
python examples/29_crewai_remote_tools.py
```

If you want to validate just the remote SochDB side before wiring an LLM
provider, you can run a storage/search smoke instead:

```bash
SOCHDB_GRPC_ADDRESS=studio.agentslab.host:50053 \
SOCHDB_CREWAI_SKIP_KICKOFF=1 \
python examples/29_crewai_remote_tools.py
```
## Directory Structure

```
Expand All @@ -81,6 +116,8 @@ examples/
├── 08_ipc_client.py # IPC client examples
├── 26_hosted_studio_ingest.py # Remote SochDB + hosted Studio example
├── 27_hosted_remote_smoke.py # Minimal hosted gRPC smoke test
├── 28_crewai_knowledge_tools.py # CrewAI tools backed by SochDB knowledge
├── 29_crewai_remote_tools.py # CrewAI tools backed by remote SochDB
└── shared/
└── mock_server.py # Mock server for testing
```
Expand Down
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,17 @@ dependencies = [
analytics = [
"posthog>=3.0.0",
]
crewai = [
"crewai",
]
dev = [
"pytest>=7.0",
"pytest-cov>=4.0",
"faker>=18.0",
]
all = [
"posthog>=3.0.0",
"crewai",
]

[project.urls]
Expand Down
Loading