Semantic search for your markdown files. A local-first CLI tool that sits alongside find and rg — find files by name, match text by pattern, or match by meaning.
find → locate files
rg → match exact text
sem → match meaning
Requires Rust:
cargo install --path .On first run, sem downloads the all-MiniLM-L6-v2 embedding model (~80MB, cached permanently).
# Index your markdown files
sem index
# Search by meaning
sem "how does authentication work"
# Multiple queries at once
sem "error handling" "logging strategy"
# Pipe queries from stdin
printf '%s\n' "auth flow" "rate limiting" | sem# Human-readable (default)
sem "query"
# JSON array
sem --json "query"
# JSONL (one object per line) — best for piping
sem --jsonl "query"
# File paths only (deduplicated)
sem --paths-only "query"
# Limit results per query (default: 5)
sem -n 3 "query"# Find relevant files, then search for a specific pattern
sem --paths-only "database migrations" | xargs rg "ALTER TABLE"
# Feed results into fzf for interactive selection
sem --paths-only "API design" | fzf --preview 'cat {}'
# Extract scores with jq
sem --jsonl "query" | jq 'select(.score > 0.3) | .path'
# Agent-style batch retrieval
printf '%s\n' "constraints" "related work" "open questions" | sem --jsonl -n 3Build or update the search index. Scans the current directory recursively for .md files.
sem index # Incremental — only re-embeds new/modified files
sem index --full # Full reindex (required after config changes)The index is stored in .sem/ at the project root (similar to .git/).
Show index health: file count, chunk count, index age, model info.
$ sem status
Model: all-MiniLM-L6-v2
Schema version: 1
Chunk limit: 512 tokens
Files indexed: 42
Total chunks: 67
Index size: 128.4 KB
Index age: 2 hours
Index location: /home/user/notes/.semSearch the index. This is the default command — no subcommand needed.
If the index is more than 3 days old, a warning is printed to stderr (results still returned).
-
Indexing — Markdown files are split into chunks (by headings for large files, whole-file for small ones). Each chunk is embedded using all-MiniLM-L6-v2 into a 384-dimensional vector. Vectors are stored in
.sem/index.bin. -
Search — Your query is embedded with the same model. Results are ranked by cosine similarity against all indexed chunks.
-
Incremental updates —
sem indextracks file modification times and only re-processes changed files.
- Local-first — no API calls, no external services. The embedding model runs locally.
- Unix-native — stdin/stdout composability, predictable flags, structured output.
- Explicit over magic — no background indexing, no silent updates. You control when the index is built.
- Minimal surface area — few commands, few flags, obvious behavior.
Add the following to ~/.claude/CLAUDE.md to make sem available as a tool in Claude Code:
# Tools
## sem — semantic search over markdown
`sem` is installed and available on PATH. Use it to find notes by meaning
when `grep`/`rg` would require knowing the exact wording.
When to use:
- User asks about a topic and you need to find relevant notes/docs
- You need context that might be spread across multiple files
- Exact keyword search (`grep`) isn't finding what you need
Usage:
```sh
sem "query" # top 5 results, human-readable
sem --paths-only "query" # just file paths (good for piping)
sem --jsonl "query" # structured output
sem -n 3 "query1" "query2" # multiple queries, 3 results each
sem --paths-only "query" | xargs cat # find then read
```
Important:
- Requires a `.sem/` index in the target directory. Run `sem index` first if none exists.
- `sem status` shows index health.
- Results go to stdout, diagnostics to stderr.
- Use `grep`/`rg` first for exact matches. Reach for `sem` when you need meaning-based search.Claude will then use sem via Bash calls when semantic search would be more effective than exact text matching.
sem indexes .md files only. It is a retrieval primitive, not an application platform.
Not in scope (v1): PDF/DOCX support, TUI, background daemon, ANN search, graph relationships, real-time indexing.
MIT