From 387b31af9e69a8079c543444a754d70f684488b2 Mon Sep 17 00:00:00 2001 From: Update Docs Agent Date: Mon, 16 Feb 2026 20:18:32 +0000 Subject: [PATCH] docs: Add semantic search documentation for QMD integration - Document qmd-setup and qmd-reindex skills - Add comprehensive semantic-search.md guide - Update README with semantic search features - Enhance cli-usage.md with search method comparison - Move semantic search from 'Planned' to 'Added' in CHANGELOG Skills qmd-setup and qmd-reindex were added in commit 5a9732a but lacked user-facing documentation. This adds complete setup instructions, usage examples, troubleshooting, and integration workflows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- CHANGELOG.md | 11 +- README.md | 60 ++++++- docs/cli-usage.md | 77 ++++++++- docs/semantic-search.md | 341 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 485 insertions(+), 4 deletions(-) create mode 100644 docs/semantic-search.md diff --git a/CHANGELOG.md b/CHANGELOG.md index a199d2b..039e6b5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,9 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- **[Semantic Search]**: QMD integration for vector and hybrid search + - New skill `/mnemonic:qmd-setup` for automated setup + - New skill `/mnemonic:qmd-reindex` for re-indexing after captures + - Supports BM25 keyword, vector semantic, and hybrid search modes + - Auto-discovers memory roots from config + - Registered collections: org-level, default, and project-level memories + - See [docs/semantic-search.md](docs/semantic-search.md) for details + ### Planned -- Semantic search with embeddings - Export/import functionality - Web UI for memory browsing diff --git a/README.md b/README.md index 030e462..e16146d 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,7 @@ A pure filesystem-based memory system for Claude Code. No external dependencies - **Skill-First Architecture**: Skills work standalone without hooks or libraries - **Cognitive Memory Types**: Semantic, episodic, and procedural memories - **Custom Ontologies**: Extend with domain-specific entity types and relationships +- **Semantic Search**: Optional vector search via qmd integration - **Bi-Temporal Tracking**: Valid time vs. recorded time - **Git Versioned**: All changes tracked with git - **Cross-Session Coordination**: Blackboard for session handoffs @@ -255,7 +256,11 @@ After running `/mnemonic:setup`, Claude will: 2. **Auto-Capture**: Automatically save decisions, learnings, and patterns 3. **Silent Operation**: Memory operations happen in the background -## Search Examples +## Search + +Mnemonic provides both traditional keyword search and semantic vector search. + +### Keyword Search (ripgrep) ```bash # Full-text search @@ -274,6 +279,37 @@ rg "^type: episodic" ${MNEMONIC_ROOT}/ --glob "*.memory.md" -l find ${MNEMONIC_ROOT} -name "*.memory.md" -mtime -7 ``` +### Semantic Search (qmd) + +For semantic/vector search capabilities, use the integrated `@tobilu/qmd` support: + +```bash +# One-time setup +/mnemonic:qmd-setup + +# Keyword search (BM25) +qmd search "authentication patterns" + +# Semantic vector search +qmd vsearch "how do we handle user sessions" + +# Hybrid search (BM25 + vector) +qmd query "database migration strategy" + +# Scope to specific collections +qmd search "auth" -c mnemonic-zircote # org memories only +qmd search "auth" -c mnemonic-project # this repo only + +# Re-index after adding new memories +/mnemonic:qmd-reindex +``` + +**Requirements:** +- Node.js >= 22 +- `npm i -g @tobilu/qmd` + +See [skills/qmd-setup/SKILL.md](skills/qmd-setup/SKILL.md) for detailed setup instructions. + ## Hooks Hooks provide proactive automation via `hookSpecificOutput.additionalContext`: @@ -325,6 +361,7 @@ mnemonic/ │ └── *.md # Slash commands ├── docs/ │ ├── architecture.md # System architecture +│ ├── semantic-search.md # QMD semantic search guide │ ├── validation.md # Memory validation guide │ ├── agent-coordination.md # Multi-agent patterns │ ├── ontologies.md # Custom ontology guide @@ -358,12 +395,19 @@ mnemonic/ ## Requirements +### Core Dependencies + - Claude Code CLI - Git - ripgrep (recommended for search) - yq (required for structured queries) - Python 3.8+ (for hooks and tools) +### Optional: Semantic Search + +- Node.js >= 22 +- `@tobilu/qmd` (`npm i -g @tobilu/qmd`) + ### Installing Dependencies ```bash @@ -374,10 +418,24 @@ brew install ripgrep yq apt install ripgrep snap install yq +# Optional: semantic search +npm i -g @tobilu/qmd + # Check installation make check-deps ``` +## Documentation + +- **[Semantic Search Guide](docs/semantic-search.md)** - Setup and use QMD for vector search +- **[CLI Usage](docs/cli-usage.md)** - Command-line operations and search patterns +- **[Architecture](docs/architecture.md)** - System design and components +- **[Validation](docs/validation.md)** - Memory validation and MIF compliance +- **[Custom Ontologies](docs/ontologies.md)** - Extend with domain-specific types +- **[Agent Coordination](docs/agent-coordination.md)** - Multi-agent workflows +- **[ADRs](docs/adrs/)** - Architecture decision records +- **[Enterprise Guides](docs/enterprise/)** - Deployment and governance + ## Related Projects - **[MIF (Memory Interchange Format)](https://mif-spec.dev)** - The specification this plugin implements. An open standard for portable AI memory storage. Schemas: https://mif-spec.dev/schema/ diff --git a/docs/cli-usage.md b/docs/cli-usage.md index f4cdf53..4e86edb 100644 --- a/docs/cli-usage.md +++ b/docs/cli-usage.md @@ -22,7 +22,51 @@ ${MNEMONIC_ROOT}/{namespace}/{scope}/*.memory.md ## Searching Memories -### Full-Text Search with ripgrep +Mnemonic provides multiple search methods optimized for different use cases. + +### Method 1: Semantic Vector Search (qmd) + +For natural language queries and semantic understanding: + +```bash +# One-time setup +/mnemonic:qmd-setup + +# BM25 keyword ranking +qmd search "authentication middleware" + +# Vector semantic search +qmd vsearch "how do we manage user permissions" + +# Hybrid search (combines both) +qmd query "error handling best practices" + +# Scope to specific collections +qmd search "api design" -c mnemonic-zircote # org memories +qmd search "api design" -c mnemonic-project # project memories +qmd search "api design" # all collections + +# Limit results +qmd search "docker" -n 5 + +# Re-index after adding memories +/mnemonic:qmd-reindex +# or manually: +qmd update && qmd embed +``` + +**When to use:** +- Natural language queries +- Conceptual/semantic similarity +- Finding related memories across namespaces +- "What do we know about X" questions + +**Requirements:** +- Node.js >= 22 +- `npm i -g @tobilu/qmd` +- Initial setup with `/mnemonic:qmd-setup` + +### Method 2: Full-Text Search with ripgrep ```bash # Search all memories for a keyword @@ -63,7 +107,13 @@ rg "confidence: 0.9" $MNEMONIC_ROOT --glob "*.memory.md" -l rg "^title:.*PostgreSQL" $MNEMONIC_ROOT --glob "*.memory.md" -l ``` -### Using find for File Operations +**When to use:** +- Exact phrase matching +- Known keywords +- Regular expressions +- Precise control over matching + +### Method 3: Using find for File Operations ```bash # List all memories @@ -79,6 +129,29 @@ find ${MNEMONIC_ROOT} -name "*.memory.md" -mtime +90 find ${MNEMONIC_ROOT} -name "*.memory.md" | grep -o '/[^/]*/project\|/[^/]*/user' | sort | uniq -c ``` +**When to use:** +- Time-based filtering +- File system operations +- Batch processing +- Directory traversal + +### Search Method Comparison + +| Feature | qmd (Semantic) | ripgrep | find | +|---------|---------------|---------|------| +| Natural language | ✅ Best | ❌ No | ❌ No | +| Exact matching | ⚠️ Good | ✅ Best | ❌ No | +| Speed | ⚠️ Moderate | ✅ Fast | ✅ Fast | +| Ranking | ✅ Relevance | ❌ No | ❌ No | +| Setup required | ⚠️ Yes | ✅ None | ✅ None | +| Regex support | ❌ No | ✅ Yes | ⚠️ Limited | +| Time filtering | ❌ No | ❌ No | ✅ Yes | + +**Recommendation:** +- **Complex questions**: Use `qmd query` +- **Known keywords**: Use `rg` +- **Time-based**: Use `find` + `rg` + ## Reading Memories ### View a Memory diff --git a/docs/semantic-search.md b/docs/semantic-search.md new file mode 100644 index 0000000..1fc6aa0 --- /dev/null +++ b/docs/semantic-search.md @@ -0,0 +1,341 @@ +# Semantic Search with QMD + +Mnemonic integrates with [@tobilu/qmd](https://github.com/tobil4sk/qmd) for semantic vector search over your memories. This enables natural language queries and conceptual similarity matching beyond simple keyword search. + +## Overview + +QMD provides three search methods: + +- **BM25 Search** (`qmd search`): Keyword-based ranking algorithm +- **Vector Search** (`qmd vsearch`): Semantic similarity using embeddings +- **Hybrid Search** (`qmd query`): Combines both for best results + +## Setup + +### Prerequisites + +- Node.js >= 22 +- npm (comes with Node.js) + +### Installation + +```bash +# Install qmd globally +npm i -g @tobilu/qmd + +# Run automated setup +/mnemonic:qmd-setup +``` + +The setup skill will: +1. Verify prerequisites +2. Discover memory roots from `~/.config/mnemonic/config.json` +3. Register collections for each memory root: + - `${MNEMONIC_ROOT}/{org}/` → `mnemonic-{org}` + - `${MNEMONIC_ROOT}/default/` → `mnemonic-default` + - `.claude/mnemonic/` → `mnemonic-project` (if exists) +4. Build search index (`qmd update`) +5. Generate embeddings (`qmd embed`) +6. Validate with test search + +**Note:** First `qmd embed` downloads ~2 GB of GGUF models. + +### Manual Setup + +If you prefer manual configuration: + +```bash +# Resolve MNEMONIC_ROOT from config +MNEMONIC_ROOT=$(python3 -c "import json; print(json.load(open('$HOME/.config/mnemonic/config.json')).get('memory_store_path', '$HOME/.local/share/mnemonic'))" 2>/dev/null || echo "$HOME/.local/share/mnemonic") + +# Register collections +qmd collection add "${MNEMONIC_ROOT}/zircote/" --name mnemonic-zircote +qmd collection add "${MNEMONIC_ROOT}/default/" --name mnemonic-default +qmd collection add ".claude/mnemonic/" --name mnemonic-project # if in a repo + +# Build index and embeddings +qmd update +qmd embed + +# Verify +qmd status +``` + +## Usage + +### Basic Search + +```bash +# BM25 keyword search +qmd search "authentication middleware" + +# Semantic vector search +qmd vsearch "how do we handle user sessions" + +# Hybrid search (recommended) +qmd query "database migration strategy" +``` + +### Scoped Search + +Search within specific memory collections: + +```bash +# Organization-level memories only +qmd search "api design" -c mnemonic-zircote + +# Project-level memories only +qmd search "api design" -c mnemonic-project + +# All collections (default) +qmd search "api design" +``` + +### Result Limiting + +```bash +# Top 5 results +qmd search "authentication" -n 5 + +# Top 10 results (default) +qmd search "authentication" + +# Custom limit +qmd query "error handling" -n 3 +``` + +### Advanced Queries + +```bash +# Multi-term queries +qmd query "user authentication AND session management" + +# Natural language +qmd vsearch "What are our conventions for REST API error codes?" + +# Technical concepts +qmd query "dependency injection patterns in Python" +``` + +## Re-indexing + +QMD indexes are **not** automatically updated. Re-index after: + +- Capturing new memories +- Bulk imports +- Direct file edits +- Hook-based captures + +### Via Skill + +```bash +/mnemonic:qmd-reindex +``` + +### Manual Re-index + +```bash +# Full re-index +qmd update && qmd embed + +# Index only (skip embeddings for speed) +qmd update + +# Specific collection only +qmd update -c mnemonic-project && qmd embed -c mnemonic-project +``` + +## Collection Management + +### List Collections + +```bash +qmd collection list +``` + +### Add Collection + +```bash +qmd collection add /path/to/memories --name my-collection +``` + +### Remove Collection + +```bash +qmd collection remove my-collection +``` + +### Check Status + +```bash +# Overview of all collections +qmd status + +# Detailed info +qmd collection info mnemonic-zircote +``` + +## Search Workflow Examples + +### Example 1: Find Related Decisions + +```bash +# Question: "What decisions have we made about authentication?" +qmd query "authentication decisions" -n 10 + +# Review results, then drill down with ripgrep +rg "authentication" ${MNEMONIC_ROOT} --glob "*decisions*.memory.md" +``` + +### Example 2: Discover Patterns + +```bash +# Question: "What patterns do we use for error handling?" +qmd vsearch "error handling patterns" + +# Get specific files +qmd search "error" -c mnemonic-project | grep "\.memory\.md" | xargs cat +``` + +### Example 3: Cross-Namespace Search + +```bash +# Question: "Everything related to PostgreSQL" +qmd query "postgresql" | grep "\.memory\.md" | while read file; do + echo "=== $file ===" + cat "$file" +done +``` + +## Performance Tuning + +### Initial Index Size + +| Memory Count | Index Time | Embedding Time | Disk Space | +|--------------|------------|----------------|------------| +| 100 | ~5s | ~30s | ~5 MB | +| 1,000 | ~30s | ~5 min | ~50 MB | +| 10,000 | ~5 min | ~45 min | ~500 MB | + +### Incremental Re-indexing + +QMD rebuilds the entire index on `update`. For large collections: + +```bash +# Index only changed files (manual approach) +find ${MNEMONIC_ROOT} -name "*.memory.md" -mtime -1 | xargs qmd update --files +``` + +### Embedding Cache + +Embeddings are cached. Deleting files from memory roots doesn't remove their embeddings until re-index: + +```bash +# Force full rebuild +rm -rf ~/.qmd/cache +qmd update && qmd embed +``` + +## Integration with Mnemonic Commands + +### Capture + Re-index + +```bash +/mnemonic:capture decisions "Use PostgreSQL for main database" +/mnemonic:qmd-reindex +``` + +### Search + Recall + +```bash +# Semantic search for relevant memories +qmd query "database schema patterns" + +# Recall specific memory by ID (from search results) +/mnemonic:recall --id 550e8400-e29b-41d4-a716-446655440000 +``` + +### Enhanced Search Skill + +The `/mnemonic:search-enhanced` skill can optionally use qmd for initial search: + +```bash +/mnemonic:search-enhanced "comprehensive guide to our authentication system" +``` + +## Troubleshooting + +### qmd command not found + +```bash +# Install globally +npm i -g @tobilu/qmd + +# Or use npx +npx @tobilu/qmd search "test" +``` + +### No results returned + +```bash +# Check collections are registered +qmd collection list + +# Check index exists +qmd status + +# Re-index +qmd update && qmd embed +``` + +### Embeddings not generated + +```bash +# Check qmd embed completed successfully +qmd embed + +# First run downloads models (~2 GB) +# Ensure sufficient disk space and network connection +``` + +### Out of date results + +```bash +# Re-index to include new memories +qmd update && qmd embed +``` + +### Performance issues + +```bash +# Use keyword search instead of vector search +qmd search "term" instead of qmd vsearch "term" + +# Limit results +qmd query "term" -n 5 + +# Search specific collections +qmd search "term" -c mnemonic-project +``` + +## Command Reference + +| Command | Purpose | Requires Index | Requires Embeddings | +|---------|---------|---------------|---------------------| +| `qmd collection add ` | Register directory | No | No | +| `qmd collection list` | Show collections | No | No | +| `qmd collection remove ` | Unregister collection | No | No | +| `qmd update` | Build BM25 index | No | No | +| `qmd embed` | Generate embeddings | Yes (update) | No | +| `qmd search ` | BM25 keyword search | Yes (update) | No | +| `qmd vsearch ` | Vector semantic search | Yes (update) | Yes (embed) | +| `qmd query ` | Hybrid search | Yes (update) | Yes (embed) | +| `qmd status` | Show index status | No | No | + +## Further Reading + +- [@tobilu/qmd Documentation](https://github.com/tobil4sk/qmd) +- [BM25 Algorithm](https://en.wikipedia.org/wiki/Okapi_BM25) +- [Vector Embeddings Explained](https://platform.openai.com/docs/guides/embeddings) +- [Mnemonic Search Skill](../skills/mnemonic-search/SKILL.md) +- [Enhanced Search Skill](../skills/mnemonic-search-enhanced/SKILL.md)