Skip to content

derrik/sem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sem

Semantic search for your markdown files. A local-first CLI tool that sits alongside find and rg — find files by name, match text by pattern, or match by meaning.

find  → locate files
rg    → match exact text
sem   → match meaning

Install

Requires Rust:

cargo install --path .

On first run, sem downloads the all-MiniLM-L6-v2 embedding model (~80MB, cached permanently).

Quick start

# Index your markdown files
sem index

# Search by meaning
sem "how does authentication work"

# Multiple queries at once
sem "error handling" "logging strategy"

# Pipe queries from stdin
printf '%s\n' "auth flow" "rate limiting" | sem

Output modes

# Human-readable (default)
sem "query"

# JSON array
sem --json "query"

# JSONL (one object per line) — best for piping
sem --jsonl "query"

# File paths only (deduplicated)
sem --paths-only "query"

# Limit results per query (default: 5)
sem -n 3 "query"

Pipeline examples

# Find relevant files, then search for a specific pattern
sem --paths-only "database migrations" | xargs rg "ALTER TABLE"

# Feed results into fzf for interactive selection
sem --paths-only "API design" | fzf --preview 'cat {}'

# Extract scores with jq
sem --jsonl "query" | jq 'select(.score > 0.3) | .path'

# Agent-style batch retrieval
printf '%s\n' "constraints" "related work" "open questions" | sem --jsonl -n 3

Commands

sem index

Build or update the search index. Scans the current directory recursively for .md files.

sem index        # Incremental — only re-embeds new/modified files
sem index --full # Full reindex (required after config changes)

The index is stored in .sem/ at the project root (similar to .git/).

sem status

Show index health: file count, chunk count, index age, model info.

$ sem status
Model:          all-MiniLM-L6-v2
Schema version: 1
Chunk limit:    512 tokens
Files indexed:  42
Total chunks:   67
Index size:     128.4 KB
Index age:      2 hours
Index location: /home/user/notes/.sem

sem <queries...>

Search the index. This is the default command — no subcommand needed.

If the index is more than 3 days old, a warning is printed to stderr (results still returned).

How it works

  1. Indexing — Markdown files are split into chunks (by headings for large files, whole-file for small ones). Each chunk is embedded using all-MiniLM-L6-v2 into a 384-dimensional vector. Vectors are stored in .sem/index.bin.

  2. Search — Your query is embedded with the same model. Results are ranked by cosine similarity against all indexed chunks.

  3. Incremental updatessem index tracks file modification times and only re-processes changed files.

Design principles

  • Local-first — no API calls, no external services. The embedding model runs locally.
  • Unix-native — stdin/stdout composability, predictable flags, structured output.
  • Explicit over magic — no background indexing, no silent updates. You control when the index is built.
  • Minimal surface area — few commands, few flags, obvious behavior.

Using with Claude Code

Add the following to ~/.claude/CLAUDE.md to make sem available as a tool in Claude Code:

# Tools

## sem — semantic search over markdown

`sem` is installed and available on PATH. Use it to find notes by meaning
when `grep`/`rg` would require knowing the exact wording.

When to use:
- User asks about a topic and you need to find relevant notes/docs
- You need context that might be spread across multiple files
- Exact keyword search (`grep`) isn't finding what you need

Usage:
```sh
sem "query"                          # top 5 results, human-readable
sem --paths-only "query"             # just file paths (good for piping)
sem --jsonl "query"                  # structured output
sem -n 3 "query1" "query2"           # multiple queries, 3 results each
sem --paths-only "query" | xargs cat # find then read
```

Important:
- Requires a `.sem/` index in the target directory. Run `sem index` first if none exists.
- `sem status` shows index health.
- Results go to stdout, diagnostics to stderr.
- Use `grep`/`rg` first for exact matches. Reach for `sem` when you need meaning-based search.

Claude will then use sem via Bash calls when semantic search would be more effective than exact text matching.

Scope

sem indexes .md files only. It is a retrieval primitive, not an application platform.

Not in scope (v1): PDF/DOCX support, TUI, background daemon, ANN search, graph relationships, real-time indexing.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages