ory · readwrightexecute · May 23, 2026
diff --git a/README.md b/README.md
@@ -275,17 +275,38 @@ for all 10 per-language benchmark deep dives.
 
 ## Configuration
 
-All configuration is via environment variables:
+Lumen supports persistent YAML configuration and environment variable overrides.
+For full details, see [docs/CONFIGURATION.md](docs/CONFIGURATION.md).
 
-| Variable                 | Default                  | Description                                                   |
-| ------------------------ | ------------------------ | ------------------------------------------------------------- |
+By default, Lumen reads YAML config from:
+
+- `$XDG_CONFIG_HOME/lumen/config.yaml`, or
+- `~/.config/lumen/config.yaml` when `XDG_CONFIG_HOME` is unset.
+
+A minimal Ollama config looks like this:
+
+```yaml
+servers:
+  - backend: ollama
+    host: http://localhost:11434
+    model: ordis/jina-embeddings-v2-base-code
+```
+
+Environment variables override YAML values and are useful for one-off changes.
+They only affect the first configured server (`servers[0]`).
+
+| Variable                 | Default                  | Description                                                      |
+| ------------------------ | ------------------------ | ---------------------------------------------------------------- |
 | `LUMEN_EMBED_MODEL`      | see note ¹               | Embedding model; use with `LUMEN_EMBED_DIMS` for unlisted models |
-| `LUMEN_BACKEND`          | `ollama`                 | Embedding backend (`ollama` or `lmstudio`)                    |
-| `OLLAMA_HOST`            | `http://localhost:11434` | Ollama server URL                                             |
-| `LM_STUDIO_HOST`         | `http://localhost:1234`  | LM Studio server URL                                          |
-| `LUMEN_MAX_CHUNK_TOKENS` | `512`                    | Max tokens per chunk before splitting                         |
-| `LUMEN_EMBED_DIMS`       | —                        | Override embedding dimensions (required for unlisted models)  |
-| `LUMEN_EMBED_CTX`        | `8192` (unlisted models) | Override context window length                                |
+| `LUMEN_BACKEND`          | `ollama`                 | Embedding backend (`ollama` or `lmstudio`)                       |
+| `OLLAMA_HOST`            | `http://localhost:11434` | Ollama server URL                                                |
+| `LM_STUDIO_HOST`         | `http://localhost:1234`  | LM Studio server URL                                             |
+| `LUMEN_MAX_CHUNK_TOKENS` | `512`                    | Max tokens per chunk before splitting                            |
+| `LUMEN_FRESHNESS_TTL`    | `60s`                    | Freshness cache duration                                         |
+| `LUMEN_REINDEX_TIMEOUT`  | `0s`                     | Config-level reindex timeout                                     |
+| `LUMEN_LOG_LEVEL`        | `info`                   | Logging verbosity                                                |
+| `LUMEN_EMBED_DIMS`       | —                        | Override embedding dimensions (required for unlisted models)     |
+| `LUMEN_EMBED_CTX`        | `8192` (unlisted models) | Override context window length                                   |
 
 ¹ `ordis/jina-embeddings-v2-base-code` (Ollama),
 `nomic-ai/nomic-embed-code-GGUF` (LM Studio)
@@ -303,6 +324,7 @@ Dimensions and context length are configured automatically per model:
 | `nomic-embed-text`                   | Ollama    | 768  | 8192    | Untested                                                              |
 | `qwen3-embedding:0.6b`               | Ollama    | 1024 | 32768   | Untested                                                              |
 | `all-minilm`                         | Ollama    | 384  | 512     | Untested                                                              |
+| `manutic/nomic-embed-code:7b`        | Ollama    | 3584 | 32768   | Untested                                                              |
 
 Switching models creates a separate index automatically. The model name is part
 of the database path hash, so different models never collide.
@@ -312,47 +334,9 @@ of the database path hash, so different models never collide.
 > Studio entry both named `foo`), they share the same index — use distinct
 > model names per backend to avoid collisions.
 
-### Selecting a server per invocation
-
-`lumen index` and `lumen search` accept `--model`/`-m` and `--backend`/`-b`
-to pick from a multi-server `config.yaml`. The selection filters the
-configured servers to those matching both fields; failover still works
-within the filtered subset.
-
-```sh
-# Index with the Ollama server matching this model name.
-lumen index --model ordis/jina-embeddings-v2-base-code .
-
-# Same model name hosted on LM Studio (present in YAML, not in the
-# static registry) — accepted because the name is configured.
-lumen index --model text-embedding-jina-embeddings-v2-base-code .
-
-# Disambiguate when the same model is configured on two backends.
-lumen index --model my-embed --backend lmstudio .
-
-# Pick the first configured Ollama server regardless of model.
-lumen search --backend ollama "…"
-```
-
-If `--model` is not configured in YAML but is a known registry model (and
-`--backend` is unset), Lumen falls back to mutating the default server's
-model — preserving `lumen index --model all-minilm .` for users with no YAML.
-
-### Using a custom or unlisted model
-
-If your model is not in the registry above, set `LUMEN_EMBED_DIMS` to bypass the
-registry check. `LUMEN_EMBED_CTX` is optional and defaults to `8192`.
-
-Both variables can also override values for _known_ models — useful when running
-a model variant with a longer context window or different output dimensions.
-
-```sh
-LUMEN_BACKEND=lmstudio
-LM_STUDIO_HOST=http://localhost:8801
-LUMEN_EMBED_MODEL=mlx-community/Qwen3-Embedding-8B-4bit-DWQ
-LUMEN_EMBED_DIMS=4096
-LUMEN_EMBED_CTX=40960   # optional, defaults to 8192
-```
+See [docs/CONFIGURATION.md](docs/CONFIGURATION.md) for multi-server setups,
+LM Studio examples, custom models, validation rules, environment-variable
+precedence, and CLI server selection.
 
 ## Controlling what gets indexed
 

diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md
@@ -0,0 +1,212 @@
+# Lumen Configuration
+
+Lumen can be configured with a YAML file and environment variables. The YAML
+file is the best choice for persistent multi-server setups; environment
+variables are useful for one-off overrides and backwards compatibility.
+
+## Config file location
+
+Lumen reads YAML config from:
+
+- `$XDG_CONFIG_HOME/lumen/config.yaml` when `XDG_CONFIG_HOME` is set
+- otherwise `~/.config/lumen/config.yaml`
+
+The MCP server watches the config directory for changes where supported by the
+underlying filesystem watcher and reloads when `config.yaml` is written or
+created.
+
+## Precedence
+
+Configuration is applied in this order, with later layers overriding earlier
+layers:
+
+1. built-in defaults
+2. YAML config file
+3. environment variables
+4. command/programmatic model overrides
+5. `--model` / `--backend` server selection filters
+
+## Minimal config
+
+Ollama:
+
+```yaml
+servers:
+  - backend: ollama
+    host: http://localhost:11434
+    model: ordis/jina-embeddings-v2-base-code
+```
+
+LM Studio:
+
+```yaml
+servers:
+  - backend: lmstudio
+    host: http://localhost:1234
+    model: nomic-ai/nomic-embed-code-GGUF
+```
+
+## Full example
+
+```yaml
+log_level: info
+max_chunk_tokens: 512
+freshness_ttl: 60s
+reindex_timeout: 0s
+
+servers:
+  - backend: ollama
+    host: http://localhost:11434
+    model: ordis/jina-embeddings-v2-base-code
+    dims: 768
+    ctx_length: 8192
+    min_score: 0.35
+```
+
+## Top-level fields
+
+| Field | Type | Default | Description |
+| --- | --- | --- | --- |
+| `log_level` | string | `info` | Logging verbosity. |
+| `max_chunk_tokens` | integer | `512` | Maximum estimated tokens per chunk before splitting. |
+| `freshness_ttl` | duration | `60s` | How long a confirmed-fresh index is trusted before rechecking. |
+| `reindex_timeout` | duration | `0s` | Reindex timeout from config. `0s` means no config-level timeout; command/server code may still apply its own operational safeguards. |
+| `servers` | list | one default Ollama server | Embedding backend configurations. |
+
+Durations use Go duration syntax such as `30s`, `5m`, or `1h`.
+
+## Server fields
+
+| Field | Type | Required | Description |
+| --- | --- | --- | --- |
+| `backend` | string | yes | `ollama` or `lmstudio`. |
+| `host` | URL | yes | HTTP(S) base URL for the embedding backend. |
+| `model` | string | yes | Embedding model name. |
+| `dims` | integer | for unknown models | Embedding vector dimension. Optional for known models. |
+| `ctx_length` | integer | no | Embedding model context length. Optional for known models. |
+| `min_score` | float | no | Default minimum cosine similarity threshold. |
+
+## Known embedding models
+
+Dimensions, context length, and default minimum score are configured
+automatically for known models:
+
+| Model | Backend | Dims | Context | Min score |
+| --- | --- | ---: | ---: | ---: |
+| `ordis/jina-embeddings-v2-base-code` | `ollama` | 768 | 8192 | 0.35 |
+| `nomic-embed-text` | `ollama` | 768 | 8192 | 0.30 |
+| `nomic-ai/nomic-embed-code-GGUF` | `lmstudio` | 3584 | 8192 | 0.15 |
+| `qwen3-embedding:8b` | `ollama` | 4096 | 40960 | 0.30 |
+| `qwen3-embedding:4b` | `ollama` | 2560 | 40960 | 0.30 |
+| `qwen3-embedding:0.6b` | `ollama` | 1024 | 32768 | 0.30 |
+| `all-minilm` | `ollama` | 384 | 512 | 0.20 |
+| `manutic/nomic-embed-code:7b` | `ollama` | 3584 | 32768 | 0.15 |
+
+`text-embedding-nomic-embed-code` is treated as an alias for
+`nomic-ai/nomic-embed-code-GGUF`.
+
+Switching models creates a separate index automatically because the model name
+is part of the database path hash. The backend is not part of that hash, so use
+distinct model names if the same model is served from multiple backends with
+incompatible embeddings.
+
+## Environment variable overrides
+
+Environment variables are applied after the YAML config file and before command
+or server-selection overrides.
+
+| Environment variable | Overrides |
+| --- | --- |
+| `LUMEN_MAX_CHUNK_TOKENS` | `max_chunk_tokens` |
+| `LUMEN_FRESHNESS_TTL` | `freshness_ttl` |
+| `LUMEN_REINDEX_TIMEOUT` | `reindex_timeout` |
+| `LUMEN_LOG_LEVEL` | `log_level` |
+| `LUMEN_BACKEND` | `servers[0].backend`; resets server 0 to backend defaults first |
+| `LUMEN_EMBED_MODEL` | `servers[0].model` |
+| `LUMEN_EMBED_DIMS` | `servers[0].dims` |
+| `LUMEN_EMBED_CTX` | `servers[0].ctx_length` |
+| `OLLAMA_HOST` | `servers[0].host` when server 0 backend is `ollama` |
+| `LM_STUDIO_HOST` | `servers[0].host` when server 0 backend is `lmstudio` |
+
+Environment variables only modify `servers[0]`. They do not rewrite every
+server in a multi-server config.
+
+## Selecting a server with CLI flags
+
+`lumen index` and `lumen search` accept `--model` / `-m` and `--backend` / `-b`
+to select from the configured server list:
+
+```bash
+lumen index --model ordis/jina-embeddings-v2-base-code .
+lumen search --backend ollama "authentication flow"
+lumen index --model my-embed --backend lmstudio .
+```
+
+`--model` and `--backend` filter the configured server list. If multiple servers
+match, order is preserved for failover. If no servers match, Lumen returns a
+descriptive error that includes the configured `(backend, model)` pairs.
+
+If `--model` is not configured in YAML but is a known registry model and
+`--backend` is unset, Lumen falls back to overriding the default server's model.
+That preserves legacy commands such as:
+
+```bash
+lumen index --model all-minilm .
+```
+
+## Multi-server and failover examples
+
+```yaml
+servers:
+  - backend: ollama
+    host: http://localhost:11434
+    model: ordis/jina-embeddings-v2-base-code
+  - backend: ollama
+    host: http://backup-ollama.example:11434
+    model: ordis/jina-embeddings-v2-base-code
+  - backend: lmstudio
+    host: http://localhost:1234
+    model: nomic-ai/nomic-embed-code-GGUF
+```
+
+When more than one configured server matches the selected backend/model, Lumen
+keeps the configured order and can fail over within that filtered subset.
+
+## Unknown/custom models
+
+If a model is not in the known model table or alias map, set `dims` explicitly:
+
+```yaml
+servers:
+  - backend: ollama
+    host: http://localhost:11434
+    model: my-custom-embedding-model
+    dims: 1024
+    ctx_length: 8192
+    min_score: 0.20
+```
+
+`ctx_length` and `min_score` are optional for custom models. If `min_score` is
+omitted, Lumen derives a dimension-aware default from `dims`.
+
+## Validation errors
+
+Lumen validates configuration before using it. Common invalid configs include:
+
+- empty `servers`
+- missing `backend`
+- unknown `backend`
+- missing `host`
+- invalid `host` URL, or a URL that is not `http://` or `https://`
+- missing `model`
+- unknown model with no explicit `dims`
+
+## MCP/server behavior
+
+The stdio MCP server loads the same configuration and uses the same precedence
+rules. When watching is available, it watches the config directory and reloads
+when the configured `config.yaml` file is written or created.
+
+Agent hosts and plugin wrappers may add their own environment variables before
+starting Lumen. Prefer YAML for stable multi-server setups, and use environment
+variables for per-session overrides.