sem

Semantic search for your markdown files. A local-first CLI tool that sits alongside find and rg — find files by name, match text by pattern, or match by meaning.

find  → locate files
rg    → match exact text
sem   → match meaning

Install

Requires Rust:

cargo install --path .

On first run, sem downloads the all-MiniLM-L6-v2 embedding model (~80MB, cached permanently).

Quick start

# Index your markdown files
sem index

# Search by meaning
sem "how does authentication work"

# Multiple queries at once
sem "error handling" "logging strategy"

# Pipe queries from stdin
printf '%s\n' "auth flow" "rate limiting" | sem

Output modes

# Human-readable (default)
sem "query"

# JSON array
sem --json "query"

# JSONL (one object per line) — best for piping
sem --jsonl "query"

# File paths only (deduplicated)
sem --paths-only "query"

# Limit results per query (default: 5)
sem -n 3 "query"

Pipeline examples

# Find relevant files, then search for a specific pattern
sem --paths-only "database migrations" | xargs rg "ALTER TABLE"

# Feed results into fzf for interactive selection
sem --paths-only "API design" | fzf --preview 'cat {}'

# Extract scores with jq
sem --jsonl "query" | jq 'select(.score > 0.3) | .path'

# Agent-style batch retrieval
printf '%s\n' "constraints" "related work" "open questions" | sem --jsonl -n 3

Commands

`sem index`

Build or update the search index. Scans the current directory recursively for .md files.

sem index        # Incremental — only re-embeds new/modified files
sem index --full # Full reindex (required after config changes)

The index is stored in .sem/ at the project root (similar to .git/).

`sem status`

Show index health: file count, chunk count, index age, model info.

$ sem status
Model:          all-MiniLM-L6-v2
Schema version: 1
Chunk limit:    512 tokens
Files indexed:  42
Total chunks:   67
Index size:     128.4 KB
Index age:      2 hours
Index location: /home/user/notes/.sem

`sem <queries...>`

Search the index. This is the default command — no subcommand needed.

If the index is more than 3 days old, a warning is printed to stderr (results still returned).

How it works

Indexing — Markdown files are split into chunks (by headings for large files, whole-file for small ones). Each chunk is embedded using all-MiniLM-L6-v2 into a 384-dimensional vector. Vectors are stored in .sem/index.bin.
Search — Your query is embedded with the same model. Results are ranked by cosine similarity against all indexed chunks.
Incremental updates — sem index tracks file modification times and only re-processes changed files.

Design principles

Local-first — no API calls, no external services. The embedding model runs locally.
Unix-native — stdin/stdout composability, predictable flags, structured output.
Explicit over magic — no background indexing, no silent updates. You control when the index is built.
Minimal surface area — few commands, few flags, obvious behavior.

Using with Claude Code

Add the following to ~/.claude/CLAUDE.md to make sem available as a tool in Claude Code:

# Tools

## sem — semantic search over markdown

`sem` is installed and available on PATH. Use it to find notes by meaning
when `grep`/`rg` would require knowing the exact wording.

When to use:
- User asks about a topic and you need to find relevant notes/docs
- You need context that might be spread across multiple files
- Exact keyword search (`grep`) isn't finding what you need

Usage:
```sh
sem "query"                          # top 5 results, human-readable
sem --paths-only "query"             # just file paths (good for piping)
sem --jsonl "query"                  # structured output
sem -n 3 "query1" "query2"           # multiple queries, 3 results each
sem --paths-only "query" | xargs cat # find then read
```

Important:
- Requires a `.sem/` index in the target directory. Run `sem index` first if none exists.
- `sem status` shows index health.
- Results go to stdout, diagnostics to stderr.
- Use `grep`/`rg` first for exact matches. Reach for `sem` when you need meaning-based search.

Claude will then use sem via Bash calls when semantic search would be more effective than exact text matching.

Scope

sem indexes .md files only. It is a retrieval primitive, not an application platform.

Not in scope (v1): PDF/DOCX support, TUI, background daemon, ANN search, graph relationships, real-time indexing.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sem

Install

Quick start

Output modes

Pipeline examples

Commands

`sem index`

`sem status`

`sem <queries...>`

How it works

Design principles

Using with Claude Code

Scope

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sem

Install

Quick start

Output modes

Pipeline examples

Commands

sem index

sem status

sem <queries...>

How it works

Design principles

Using with Claude Code

Scope

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`sem index`

`sem status`

`sem <queries...>`

Packages