A drop-in client for any OpenAI-compatible chat endpoint — easyai-server, llama-server, vLLM, OpenAI itself, anything that speaks
/v1/chat/completions. Renders responses with reasoning streams, registers tools client-side, dispatches their handlers in-process, and pushes the results back. Single binary, REPL or one-shot, no model loaded — pure protocol.
- Quick start
- Connection — endpoint, model, auth
- Modes — REPL, one-shot, piped, management
- Command-line flags
- Configuration file (
easyai-cli.ini) - Tool registration
- System prompt + injected blocks
- Sampling and penalty knobs
- Reasoning streams
- The raw transaction log
- Session persistence
- memory — persistent memory
- External tools
- Management subcommands
- Worked examples
- Cross-references
# 1) Point it at any OpenAI-compatible endpoint.
easyai-cli --url http://ai.local:8080 -p "what time is it?"
# 2) REPL — drop the prompt, type interactively.
easyai-cli --url http://ai.local:8080
# 3) Coding agent — sandbox + bash + plan, all auto-wired.
easyai-cli --url http://ai.local:8080 \
--allow-bash --sandbox ~/projects/foo \
"implement a tetris in C++ with SOLID design"
# 4) Pipe a prompt in.
echo "summarise this" | easyai-cli --url http://ai.local:8080Connection details are remembered via env vars so the per-command line stays short:
export EASYAI_URL=http://ai.local:8080
export EASYAI_API_KEY=... # if the server is auth-on
easyai-cli "what's new on hacker news today?"The transport layer is plain HTTP(S) POST /v1/chat/completions. The
client streams the SSE response, parses delta.{content,reasoning,tool_calls},
dispatches any tool calls in-process, and posts the next turn.
| Flag | Env var | Default | Notes |
|---|---|---|---|
--url <URL> |
EASYAI_URL |
(none — required) | Base URL of the server. /v1/chat/completions is appended automatically. https:// works if the binary was built with OpenSSL. |
--api-key <KEY> |
EASYAI_API_KEY |
(empty) | Bearer token sent as Authorization: Bearer <KEY> on every request. |
--model <NAME> |
EASYAI_MODEL |
EasyAi |
The model field of the request body. easyai-server returns whatever it has loaded under any name; other servers may match strictly. |
--timeout <SEC> |
EASYAI_TIMEOUT |
1800 (30 min) |
Read/write timeout on the streaming connection. Bumped from the usual 60 s to accommodate long thinking turns. |
--http-retries <N> |
EASYAI_HTTP_RETRIES |
5 |
Extra attempts on transient HTTP failures (connect refused, read timeout, 5xx). 4xx never retries. Each retry logs to stderr. 0 disables. |
--insecure-tls |
— | off | Skip peer cert verification (https only). Dev / self-signed only. |
--ca-cert <PATH> |
— | (system) | Trust the PEM bundle at <PATH> for https. |
If --url is omitted and EASYAI_URL is unset, the binary errors out
at startup with a usage hint.
Connection lifecycle (since 2026-05-08): the cli holds a single
persistent httplib::Client for the entire session — every agentic
hop (chat completion + tool dispatch + chat completion + …) reuses
the same TCP connection thanks to HTTP keep-alive. This was a real
bug before that date: the cli rebuilt the Client per request, so
each hop opened a fresh connection that piled up in TIME_WAIT for
~60 s on the client. A 50-tool-call session opened 50 sockets and
on long sessions exhausted the ephemeral-port range, surfacing as
Connection timed out retry storms. The fix is purely on the cli
side and transparent to anything connecting to easyai-cli's
upstream. To confirm keep-alive is working in production, point the
cli at an easyai-server with [SERVER] verbose = on and watch the
http: in_flight=... field of the periodic METRICS line plus the
per-request → / ← log: a healthy session shows steady
reqs=N increments with in_flight=0..1 between hops, and the
system-wide tcp: time_wait count stays low. Before the fix, every
hop bumped tcp: time_wait and eventually drove the
TIME_WAIT N/M ephemeral ports (X.X% …) indicator into the
elevated / HIGH / CRITICAL bands.
The same binary covers four operating modes; they're selected by what's on the command line and stdin.
| Mode | Trigger | Behaviour |
|---|---|---|
| REPL | No -p, no positional prompt, stdin is a TTY |
Interactive prompt loop. Green ● prompt. Ctrl-C stops generation and returns to prompt. /exit or Ctrl-D to quit. |
| Shell | --shell |
Hybrid AI shell. Normal commands via $SHELL, lines prefixed with > go to the AI. cd/export/unset persist. See §3a. |
| One-shot | -p <text> OR a positional argument |
Send the single prompt, stream the reply, exit. |
| Piped | stdin is a pipe (anything redirected in) | Reads stdin into the prompt and runs once. Same as one-shot. |
| Management | --list-models, --list-tools, --list-remote-tools, --health, --props, --metrics, --set-preset, --show-system-prompt |
Hits the named endpoint (or, for --show-system-prompt, just resolves locally), prints the result, exits. No chat. See §14. |
The modes are mutually exclusive: passing -p AND a management flag is
an error.
--shell starts a hybrid AI shell. The user's $SHELL executes
normal commands; lines prefixed with > are sent to the AI model.
easyai-cli --url http://ai.local:8080 --shell
~/project $ ls -la # executed via zsh/bash
~/project $ cd src # persists (handled in-process)
~/project/src $ > explain main.cpp # AI takes over
~/project/src $ /exit # quitThe prompt shows the current directory (abbreviated with ~).
--shell implies --allow-bash.
Builtins — run in-process so state persists across commands:
| Builtin | Behaviour |
|---|---|
cd [dir] |
Supports ~, - (OLDPWD), relative and absolute paths. |
export KEY=VALUE |
Sets env var (quotes stripped). |
unset VAR |
Removes env var. |
Slash commands — same as the REPL: /exit, /quit, /clear,
/reset, /compress, /plan, /tools, /help.
Shell-like single-Ctrl-C — no escalation, no multi-step dance.
| Context | First Ctrl-C | Triple rapid Ctrl-C |
|---|---|---|
| Mid-generation (REPL or shell) | Stops generation, prints <stopped.>, returns to prompt. |
Force-exit (_exit(130)). |
| At the prompt (REPL or shell) | Clears the line and shows a new prompt (like bash). Does not exit. | Force-exit. |
Shell command running (--shell) |
Kills the child process (SIGINT delivered to its process group). Returns to prompt. | Force-exit. |
--quiet (batch) |
Hard cancel + exit immediately (rc=130). |
Force-exit. |
Exit via /exit, /quit, or Ctrl-D (EOF). The triple-rapid
force-exit is the escape hatch for stuck streams or deadlocked tool
handlers.
Full reference, grouped the way --help shows them. Env-var fallbacks
appear next to the matching flag.
| Flag | Env | Notes |
|---|---|---|
--url URL |
EASYAI_URL |
Required (or set via env). |
--api-key KEY |
EASYAI_API_KEY |
Bearer auth. |
--model NAME |
EASYAI_MODEL |
Default EasyAi. |
--timeout SEC |
EASYAI_TIMEOUT |
Default 1800. |
--http-retries N |
EASYAI_HTTP_RETRIES |
Default 5. |
--insecure-tls |
— | https only — DEV ONLY. |
--ca-cert PATH |
— | PEM bundle for custom CAs. |
| Flag | Notes |
|---|---|
--system TEXT |
Inline system prompt. |
--system-file PATH |
System prompt loaded from a file. Beats --system if both are given (but you'd usually use one). |
When neither is passed, the server's default persona handles the system message. Either flag still gets the [environment] + [guidance] injection prepended (see §7).
| Flag | Range | Notes |
|---|---|---|
--temperature F |
typically 0–2 | OpenAI standard. |
--top-p F |
0–1 | Nucleus top-p. |
--top-k N |
int ≥0 | Top-k cutoff. |
--min-p F |
0–1 | llama.cpp / easyai min-p. |
--repeat-penalty F |
≥ 0 | Default 1.04 — anti-loop safety net for thinking models. Pass 1.0 to disable. |
--frequency-penalty F |
-2..2 | OpenAI standard. |
--presence-penalty F |
-2..2 | OpenAI standard. |
--seed N |
int | Deterministic sampling. |
--max-tokens N |
int | Cap reply length. |
--stop SEQ |
repeatable | Add a stop string. |
--extra-json '{...}' |
JSON | Free-form object merged into the request body — escape hatch for server-specific fields. |
| Flag | Notes |
|---|---|
--tools LIST |
Comma list, overrides the default catalog. See §6 for valid names. |
--sandbox DIR |
Working root for fs / bash / python3. Auto-registers the unified fs tool (action=read / write / list / glob / grep / check_path / cwd / sandbox). bash and python3 still require their respective --allow-* flags. |
--allow-bash |
Register bash. Implies fs (bash subsumes it). cwd = --sandbox if given, else the binary's CWD. WARNING: not a hardened sandbox. |
--no-python |
Drop the auto-registered compute tool (model-facing name evaluate, runtime python3; renamed 2026-05-26 with python3 retained as a back-compat alias). By default ON whenever --sandbox or --allow-bash is set. Stdlib-only interpreter (no PYTHON* env, no site-packages, no cwd on sys.path). READ-ONLY disk surface: any path outside the sandbox AND any write-mode open() regardless of path is rejected. The model is told to send writes through the filesystem write tool registered this session (it discovers the exact callable name from its AVAILABLE TOOLS list). WARNING: defense-in-depth, not a hardened sandbox — import os / import socket / import subprocess / import ctypes still work at the Python layer (closure-cell introspection also bypasses — SECURITY_AUDIT §23.2). |
--use-google |
Enable engine="google" inside the unified web tool (Google Custom Search JSON API), and let the default engine="auto" cascade try google as its first hop. Requires GOOGLE_API_KEY and GOOGLE_CSE_ID env vars. Without this flag (or env vars), the auto cascade silently falls through to brave → ddg-lite → bing → ddg. |
--memory DIR |
Enable persistent knowledge rooted at DIR — a passive RAG technique. Registers seven split knowledge_* tools (knowledge_save, knowledge_append, knowledge_search, knowledge_load, knowledge_list, knowledge_delete, knowledge_keywords), AND appends a compact # MEMORY VOCABULARY block to the system prompt prefix so the remote model sees the current keyword index without having to call knowledge_keywords. --RAG is still accepted as a back-compat alias. See RAG.md §5 "Automatic vocabulary injection". |
--external-tools DIR |
Load every EASYAI-*.tools manifest in DIR. See EXTERNAL_TOOLS.md. |
--no-plan |
Don't auto-register the plan tool. |
| Flag | Notes |
|---|---|
--shell |
Hybrid AI shell — starts $SHELL, > prefix for AI prompts. cd/export/unset persist. Implies --allow-bash. INI: [cli] shell = true. See §3a. |
-p TEXT, --prompt TEXT |
One-shot prompt. (You can also pass it as a positional arg or pipe via stdin.) |
--no-reasoning, --hide-reasoning |
Hide delta.reasoning_content (default: shown inline in dim grey). |
--max-reasoning N |
Abort the SSE stream when this turn's reasoning exceeds N chars. 0 = unlimited (default). Useful for thinking models that fall into long deliberation loops. |
--no-retry-on-incomplete |
Disable the auto-retry-with-nudge for incomplete turns (default: ON). |
--retry-on-incomplete |
Legacy alias for the now-default behaviour. No-op. |
--verbose, -v |
Log HTTP+SSE diagnostics to stderr (timestamps + per-piece traces). Also logs every per-batch easyai.prompt_progress event with full metrics. Stderr-only — does NOT create a /tmp log file (use --log-file for that). |
-q, --quiet |
Disable the spinner glyph + context-fill gauge. Use for batch / scripted runs. Also changes Ctrl-C / SIGTERM semantics: first signal hard-cancels and exits (rc=130). See Ctrl-C and SIGTERM. |
--no-prompt-progress |
Ask the server to skip per-batch easyai.prompt_progress SSE events for this session. The spinner loses its live thinking N% · ctx M% gauge during prompt eval (falls back to a static "thinking" word); in return the wire goes quiet during eval. The final easyai.prompt_eval summary still fires and is always logged to stderr + the --log-file file, regardless of --verbose. INI: [cli] prompt_progress = on|off. |
--log-file PATH |
Opt in to a raw transaction log at PATH (request body + every SSE chunk + every tool dispatch input/output, mode 0600). Default OFF — no log file is written without this flag. Implies --verbose. |
--tools-mode MODE |
How fs / web are exposed to the model. MODE is one of split (default — one focused tool per action: fs_read, fs_edit, web_search, web_fetch, …; small models dispatch more reliably here), unified (single dispatcher per family with action=; this is where the fs(action="ops") batch lives — up to 50 ops / 20 files per call), or both (register both surfaces side-by-side). Same handlers under the hood; only the registration shape differs. INI: [cli] tools_mode = unified|split|both. |
--continue |
Load .easyai_session from cwd before the first prompt. Default OFF (since 2026-05-13) — any existing session file is ignored and overwritten on the first turn unless this flag is set. INI: [cli] auto_continue = true|false. See §11. |
--no-continue |
Explicit form of the default — ignore any existing .easyai_session and overwrite on the first turn. Useful to override [cli] auto_continue = on set in INI. |
--compress |
After loading, ask the model for one lossless recap of the conversation and replace the history with that recap. Also reachable mid-REPL via /compress. No-op without --continue (nothing in memory to recap). INI: [cli] auto_compress = true|false. |
See §14 for the full picture.
| Flag | Result |
|---|---|
--list-tools |
Local tools (registered in this CLI), with full descriptions. |
--list-remote-tools |
GET /v1/tools — server-side tools (easyai-server extension). |
--list-models |
GET /v1/models. |
--health |
GET /health. |
--props |
GET /props. |
--metrics |
GET /metrics (Prometheus text). |
--set-preset NAME |
POST /v1/preset {preset:NAME}. |
--show-system-prompt |
Print the resolved system prompt (built-in [environment] + [guidance] injection PLUS --system / --system-file content) and exit. Does NOT contact the server — useful for confirming what the model would see, including without a working --url. |
| Flag | Notes |
|---|---|
-h, --help |
Print the full help and exit. |
Every command-line knob also has an INI equivalent so an operator can bake their connection details, sampling defaults, and tool catalog into a file once and stop typing flags. Precedence is:
command-line flag > INI value > hardcoded default
When --config <path> is not given, the CLI looks for an INI file
in layers and uses the first one it finds:
| Order | Path | Use case |
|---|---|---|
| 1 | $HOME/.easyai/easyai-cli.ini |
Per-user — the common case. The CLI runs as your user, not as a service, so this is where most settings belong. |
| 2 | /etc/easyai/easyai-cli.ini |
System-wide fallback — useful for a shared box where every user should hit the same server with the same defaults. |
| 3 | (none) | No INI loaded; the CLI runs on hardcoded defaults + env vars + whatever you pass on the command line. |
--config <path> bypasses the layered lookup and pins one file. If the
explicit path doesn't exist, the CLI prints a warning and falls through
to defaults (it doesn't silently search elsewhere). A missing
layered-default path is silent — it just means you haven't created a
config yet.
Run with --verbose (or [cli] verbose = true) to see which path the
CLI resolved to at startup.
A pristine reference file lives at
resources/easyai-cli.ini.example —
every key documented, every line commented out. Activate by copying it
to one of the lookup locations and uncommenting what you want:
mkdir -p ~/.easyai
cp resources/easyai-cli.ini.example ~/.easyai/easyai-cli.ini
$EDITOR ~/.easyai/easyai-cli.ini # uncomment url, api_key, tools, …
easyai-cli "what's new on hacker news today?"Minimal ~/.easyai/easyai-cli.ini for a workstation talking to a single
AI box:
[cli]
url = http://ai.local:8080
api_key = REPLACE-WITH-OPENSSL-RAND-HEX-32
model = EasyAi
verbose = false
quiet = false
tools = datetime, plan, web
tools_mode = split
auto_continue = falseEverything lives under a single [cli] section. Unknown keys are
ignored silently; values that fail to parse fall back to the hardcoded
default and print a one-line warning at startup. Booleans accept
true / false, on / off, yes / no, 1 / 0. List values are
comma-separated.
| Key | Type | CLI flag | Default | Notes |
|---|---|---|---|---|
url |
string | --url |
(env EASYAI_URL) |
Full URL of the OpenAI-compatible endpoint. |
api_key |
string | --api-key |
(env EASYAI_API_KEY) |
Bearer token. |
model |
string | --model |
EasyAi |
Model id in the request body. |
timeout |
int | --timeout |
86400 |
Read/write timeout, seconds. SSE deltas reset the timer. |
http_retries |
int | --http-retries |
5 |
Extra retries on transient HTTP failures. |
max_tool_hops |
int | --max-tool-hops |
99999 (unlimited) |
Per-turn ceiling on tool calls. |
insecure_tls |
bool | --insecure-tls |
false |
Skip TLS peer-cert verification. https only. DEV ONLY. |
ca_cert |
path | --ca-cert |
(system trust store) | PEM CA bundle to trust for https. |
| Key | Type | CLI flag | Default | Notes |
|---|---|---|---|---|
system |
string | --system |
(empty) | Inline system prompt. |
system_file |
path | --system-file |
(empty) | System prompt from a file. Wins over system when both are set. |
Unset / sentinel = the field is omitted from the request body, so the server's preset drives sampling. Set explicitly to override.
| Key | Type | CLI flag | Default | Notes |
|---|---|---|---|---|
temperature |
float | --temperature |
server default | |
top_p |
float | --top-p |
server default | |
top_k |
int | --top-k |
server default | |
min_p |
float | --min-p |
server default | |
repeat_penalty |
float | --repeat-penalty |
1.04 |
Anti-loop multiplicative penalty. Set 1.0 to disable. |
frequency_penalty |
float | --frequency-penalty |
server default | OpenAI semantics. |
presence_penalty |
float | --presence-penalty |
server default | OpenAI semantics. |
seed |
int64 | --seed |
random | -1 = random. |
max_tokens |
int | --max-tokens |
-1 |
-1 = unlimited until EOS / ctx full. |
stop |
list | --stop (repeatable) |
(empty) | Comma-separated stop sequences. |
extra_json |
json | --extra-json |
(empty) | Single-line JSON object literal merged into the request body. |
| Key | Type | CLI flag | Default | Notes |
|---|---|---|---|---|
tools |
list | --tools |
(built-in catalog) | Comma-separated tool names. Empty = default catalog. |
tools_mode |
enum | --tools-mode |
split |
unified / split / both. |
sandbox |
path | --sandbox |
(empty) | Sandbox root for fs / python3 / bash. |
allow_bash |
bool | --allow-bash |
false |
Register the bash tool. NOT a hardened sandbox. |
allow_python |
bool | --no-python (off) |
true |
Register the compute tool (model name evaluate, runtime python3). Flip false to opt out. |
use_google |
bool | --use-google |
false |
Enable engine="google" in the web tool. |
external_tools |
path | --external-tools |
(empty) | Dir of EASYAI-*.tools manifests. |
memory |
path | --memory / --RAG |
(empty) | RAG persistent-registry directory. |
no_plan |
bool | --no-plan |
false |
Skip auto-registering the plan tool. |
show_bash |
bool | --show-bash / --no-show-bash |
true |
Mirror bash subprocess output to the operator's terminal. |
show_python |
bool | --show-python / --no-show-python |
true |
Same mirror for python3. |
| Key | Type | CLI flag | Default | Notes |
|---|---|---|---|---|
show_reasoning |
bool | --no-reasoning (off) |
true |
Print streaming reasoning_content to stderr. |
max_reasoning |
int | --max-reasoning |
0 |
0 = unlimited. Hard cap on reasoning tokens before nudging. |
retry_on_incomplete |
bool | --no-retry-on-incomplete (off) |
true |
Retry when the turn finishes with no tool_call and only an "announce" snippet. |
| Key | Type | CLI flag | Default | Notes |
|---|---|---|---|---|
verbose |
bool | -v / --verbose |
false |
Prints resolved INI path + raw HTTP bodies + tool dispatch traces. |
quiet |
bool | -q / --quiet |
false |
Disable spinner + ctx-% gauge (batch / scripted use). |
log_file |
path | --log-file |
(empty) | Raw transaction log path. Empty = no log file. |
auto_log |
bool | (no CLI flag) | false |
Legacy /tmp auto-log; the log_file key is the recommended replacement. |
unattended |
bool | --unattended |
(auto) | Tell the model no human is at the terminal. Auto-set when a prompt is passed on the command line. |
| Key | Type | CLI flag | Default | Notes |
|---|---|---|---|---|
auto_continue |
bool | --continue / --no-continue |
false |
Load .easyai_session from cwd before the first prompt. |
auto_compress |
bool | --compress |
false |
Run lossless recap on every load. Implies auto_continue. |
session_file |
path | --session-file |
.easyai_session (cwd) |
Override the default filename. Implies auto_continue. |
no_local_session |
bool | --no-local-session |
false |
Read-only mode: load the session but never write back. |
A workstation talking to a sandboxed coding agent on the AI box, with persistent session and a custom log:
[cli]
; ---- connection ----
url = http://ai.local:8080
api_key = 9f3c…hex… ; openssl rand -hex 32
model = EasyAi
timeout = 86400
; ---- tools / sandbox ----
tools_mode = split
sandbox = /Users/gustavo/projects
allow_bash = true
memory = ~/.easyai/rag
; ---- reasoning ----
show_reasoning = true
max_reasoning = 0
; ---- session ----
auto_continue = true
auto_compress = false
log_file = ~/.easyai/cli.log
; ---- display ----
verbose = false
quiet = falseThen on the command line:
easyai-cli "refactor this module for SOLID"…and every flag above is implicit. Override one-off with the matching
--flag (e.g. easyai-cli --no-continue "fresh chat").
The CLI registers tools client-side: their handlers run in the binary's own process, not on the server. The server is told what tools exist (their names + JSON schemas) and asks for them when needed; the client dispatches and posts the result back as a tool message.
When --tools is not given, the CLI auto-registers:
datetime, plan, web,
system_meminfo, system_loadavg, system_cpu_usage, system_swaps
…plus, conditionally:
| Trigger | Adds |
|---|---|
--sandbox DIR OR --allow-bash |
The unified fs tool AND python3 (the latter unless --no-python) |
--allow-bash |
bash (and bumps the agentic loop's max_tool_hops to 99999) |
--no-python |
drops the auto-on python3 tool (otherwise on whenever fs is on) |
--use-google (+ env vars set) |
Enables engine="google" inside the unified web tool, and lets the default engine="auto" cascade try google first (otherwise auto starts at bing) |
--memory DIR (alias --RAG) |
knowledge_save, knowledge_append, knowledge_search, knowledge_load, knowledge_list, knowledge_delete, knowledge_keywords |
--external-tools DIR |
every tool from each loaded EASYAI-*.tools manifest |
Bash is strictly more permissive than the unified fs tool — if the
operator trusts the model with bash, they trust it with
fs(action="read") etc. by construction. Requiring an extra
--allow-fs flag for the narrower surface produced sessions where
the model had bash but no fs and fell back to cat > file /
cat <<EOF / sed -i for ordinary file work. The new defaults
eliminate that trap: any flag that says "the model can touch files"
registers fs automatically.
fs(action="sandbox") is one of the unified tool's sub-actions, so
the model can always resolve the real on-disk path of where its work
is landing — distinct from fs(action="cwd"), which reports the live
process cwd and can drift.
Pass --tools LIST to override the auto-catalog. Valid names:
datetime, plan, web, fs, bash,
system_meminfo, system_loadavg, system_cpu_usage, system_swaps,
knowledge_save, knowledge_append, knowledge_search, knowledge_load,
knowledge_list, knowledge_delete, knowledge_keywords
(rag and memory are still accepted as back-compat aliases that register all seven knowledge tools.)
bash / knowledge tools still require their respective opt-in flags even when
explicitly listed; engine="google" inside web likewise depends on
--use-google plus the env vars.
easyai-cli --url ... --sandbox ~/foo --allow-bash --list-toolsPrints every registered tool's name + full description, in the same order they're sent to the server. Useful for debugging "why didn't the model use my tool?".
See AI_TOOLS.md for the deep dive on what a tool is, and
manual.md §3.2 / §3.2.1 for how to author your own.
When the agent has any create/mutate affordance (fs_* / bash / plan), the CLI prepends two small in-binary blocks to the user's system prompt:
[environment]
sandbox root: /Users/.../projects/foo
fs_* tools' virtual `/` maps here; bash runs with this as its cwd.
[guidance]
When asked to create something, pick one viable implementation and
carry it through to a working end state. Do not enumerate options,
branch on hypotheticals, or stop at a draft. Choose, build, verify
it runs, then report. The user can ask for refinements after they
see it working.
Why:
[environment]— without it, the first move of any coding agent is "where am I?" (fs(action="cwd")/pwd). Injecting the resolved absolute path saves that hop on every task.[guidance]— smaller models otherwise enumerate options, ask permission for every choice, or stop at a draft. The assertive framing shifts them toward a working result.
The user's --system / --system-file content (if any) appears
after these blocks, so user intent has the last word.
When the agent has no fs/bash/plan tool — pure chat with web search and nothing else — neither block is injected.
All knobs are server-side parameters; the CLI just forwards what you pass. Omitting any leaves the server's default in place.
The penalties (--repeat-penalty, --frequency-penalty,
--presence-penalty) all bias generation against tokens that have
already been produced — but they bite differently:
| Flag | Form | Bites on |
|---|---|---|
--repeat-penalty F |
multiplicative on recent logits | tight literal repetition ("I'll write X / Let me write X / OK, creating X") |
--frequency-penalty F |
additive, scales with token count | over-use of common tokens ("the the the") |
--presence-penalty F |
additive, fixed cost per token-already-seen | topic stickiness without per-occurrence ramp-up |
--repeat-penalty 1.15 (the CLI's only non-obvious default) is the
anti-loop safety net. Pass 1.0 to disable when you want the model
to repeat itself — for example when calling the same tool many times
in an agentic flow and you don't want the model paraphrasing tool
names after the third call.
--presence-penalty F (OpenAI standard, range [-2.0, 2.0],
default 0.0) is the gentler companion. Reach for it when:
- You're running long agentic flows where
repeat_penalty=1.15starts making the model invent tool-name synonyms. - The model has correct content but keeps rehearsing the same topic instead of moving on.
- You want "introduce new vocabulary" pressure without the
per-occurrence cost ramp of
repeat_penalty.
Typical pairings:
| Workload | repeat_penalty |
presence_penalty |
|---|---|---|
| Short chat / single-tool turns | 1.15 (default) |
0.0 |
| Long agentic flows (10+ hops) | 1.0 (off) |
1.0 to 1.5 |
| Brainstorm / creative writing | 1.15 |
0.6 to 1.0 |
| Code generation, structured output | 1.15 |
0.0 |
See design.md §4b
for the full rationale on why the two penalties exist and when to
pick which.
--extra-json is the escape hatch for fields the CLI doesn't know
about. Whatever JSON object you pass is merged shallowly into the
request body before send, so server-specific extensions (vendor sampling
modes, custom routing hints) work without recompiling.
For models that emit reasoning_content (Qwen-thinking, GPT-o1-class,
Claude 4.x extended thinking), the CLI prints the reasoning stream
inline in dim grey, separate from the visible content. This is on by
default. --no-reasoning (or --hide-reasoning) suppresses it.
--max-reasoning N is a defensive cap: if the accumulated
reasoning_content for a single turn exceeds N characters, the SSE
stream is aborted and the turn is treated as incomplete (which then
triggers the auto-retry-with-nudge unless that's disabled). Default 0
(unlimited). Useful when a thinking model falls into a deliberation
loop on a niche question.
Incomplete-turn handling: when the server flags a turn as
timings.incomplete=true (model produced no tool_call AND only a tiny
reply, e.g. "I'll search…"), the CLI by default drops that turn,
appends a corrective user nudge, and re-issues ONCE.
--no-retry-on-incomplete opts out — useful when you want to see the
raw incomplete signal for debugging.
The raw transaction log is opt-in via --log-file PATH. Without
that flag, no log file is created — neither the binary nor the
library writes to /tmp by default.
easyai-cli --url http://ai.local --log-file /tmp/run.log "your prompt"The log at PATH is a verbatim record of:
- The HTTP request body (every turn — including the resolved system prompt with injected blocks, the full tools array, the message history).
- Every SSE chunk byte-for-byte.
- Every tool call dispatched: input arguments, output content, duration.
- Connection-level events (retries, timeouts, status codes).
Mode 0600. --log-file implies --verbose (so the file carries
CLI-side diagnostics alongside the raw wire bytes). Suitable for
replaying / diffing / grepping.
For one-off debugging without a persistent file, --verbose alone
streams the same diagnostics to stderr.
What changed (2026-05-12): prior versions auto-opened
/tmp/easyai-cli-{pid}-{epoch}.logwhenever--verbosewas set, AND the library-sideeasyai::Clientopened a separate/tmp/easyai-client-{pid}-{epoch}.logon every construction unlessEASYAI_NO_AUTO_LOG=1was in the env. Both auto-opens are now disabled by the cli binary so a default invocation leaves no artifacts behind. To restore the library auto-open behaviour, setEASYAI_NO_AUTO_LOG=0explicitly in the environment.
Every easyai-cli invocation writes a .easyai_session file in the
current working directory after each chat turn (atomic tempfile +
rename(2), mode 0600, O_NOFOLLOW). The file is the OpenAI-shape
message array — same format the CLI sends on the wire — so it's
plain-text greppable, diffable, and re-loadable in a future
invocation.
Loading is default-OFF since 2026-05-13. Even when a
.easyai_session already exists in the current directory,
easyai-cli starts fresh silently — and overwrites the file on the
first turn. Pass --continue (or set [cli] auto_continue = on in
INI) to resume from the existing file before the first prompt.
Saving on every turn is unchanged.
$ cd ~/project
$ easyai-cli --url http://ai.local
> fix the build error in src/main.cpp
[turn completes; .easyai_session updated]
> /exit
# Tomorrow, same project — resume requires --continue:
$ cd ~/project
$ easyai-cli --url http://ai.local --continue
[easyai-cli-remote] continued from .easyai_session in /Users/x/project
> what was the build error again?
[model has the prior context]
# Without --continue the existing file is overwritten on the first turn:
$ cd ~/project
$ easyai-cli --url http://ai.local
> hello
[turn completes; .easyai_session overwritten with this fresh history]Four control points:
| Surface | What it does |
|---|---|
| (no flag) | Default: ignore any .easyai_session and overwrite it on the first turn. Save on every turn. |
--continue |
Load the existing .easyai_session (if any) before the first prompt; otherwise start fresh. Overrides [cli] auto_continue = off. |
--no-continue |
Explicit form of the default — useful to override an operator's [cli] auto_continue = on for this invocation. |
--compress |
After loading, ask the model for one lossless recap of the conversation and replace history with the recap. No-op without --continue (nothing in memory to recap). |
/compress (in the REPL) |
Same compress flow, fired mid-session when context gets long. |
The .easyai_session is checkpointed at three layers, in this order:
- After every tool dispatch during a turn — written from the
on_toolcallback, so even mid-turn progress hits disk before the model continues reasoning. - After every
chat()return inrun_one()— covers graceful completion and stage-1 cancel (Ctrl-C once, model lets the SSE close). - After every history-mutating slash command (
/clear,/reset,/compress).
The first layer is what makes a force-exit (Ctrl-C 3×, stage 3 →
_exit(130) from the signal handler) still leave a useful session
behind: the file on disk reflects the conversation up to the last
completed tool round-trip, only the in-flight partial reply is lost.
Stages 1 and 2 (graceful + cancel) also work because their chat()
returns normally and layer 2 fires.
The compress prompt instructs the model to preserve verbatim: every
file path, every decision made, every code change, every error with
its cause, every tool result still relevant, every user-stated
constraint or preference. And to strip: pleasantries, abandoned
exploratory branches, retries of the same query. The output replaces
history as a synthetic two-message pair
({user: "Previous conversation summarised below; continue from here."} {assistant: "<recap>"}) so the chat template sees a normal turn
shape.
History-mutating slash commands (/clear, /reset, /compress)
also save .easyai_session so a later resume picks up the
post-command state.
Every session-related knob is also reachable via [cli] keys in the
INI file ($HOME/.easyai/easyai-cli.ini by default; full lookup order
and the complete table are in
§5. Configuration file).
Precedence: CLI flag > INI > hardcoded default. The session-relevant
subset:
INI key ([cli]) |
Default | CLI flag(s) | Effect |
|---|---|---|---|
auto_continue |
false |
--continue / --no-continue |
Load .easyai_session from cwd before the first prompt. |
auto_compress |
false |
--compress |
Run the compress flow on every load (rare; usually you want /compress on demand). |
session_file |
(empty) | --session-file |
Override the default filename. Implies auto_continue. |
no_local_session |
false |
--no-local-session |
Read-only mode: load the session but never write back. |
Operators who don't want session files in cwd at all: leave
auto_continue = false (the default) so existing files are
overwritten rather than read, and rm .easyai_session if it leaks
past — there's no --no-session flag today. The file is local to
cwd, not ~, so the unit of persistence is naturally the
project directory you're working in: two projects in two different
dirs have two independent sessions.
--memory <dir> mounts a directory as the agent's long-term knowledge.
It registers seven split tools — knowledge_save, knowledge_append,
knowledge_search, knowledge_load, knowledge_list, knowledge_delete,
knowledge_keywords; under the hood it's a passive RAG technique — each
entry is a single keyword-indexed Markdown file in <dir> that the
operator can hand-edit. Keywords are the identifier: sorted and joined
by _ they become the filename. The legacy flag --RAG is still
accepted as a back-compat alias.
Vocabulary auto-injection. --memory also appends a compact
# MEMORY VOCABULARY block to the system prompt prefix (the same
prefix that carries [environment], [guidance], the tools_block,
and the cite-sources rule). The block lists every distinct keyword
in the store + its count, sorted count desc / name asc, capped at
top 40. The remote model now sees what it has tagged on every
turn — knowledge_search(keywords=[...]) becomes
actionable without first calling knowledge_keywords.
Empty store → block omitted, no wasted tokens.
The builder is shared with easyai-server and easyai-local
(see easyai::preamble::build in include/easyai/preamble.hpp);
change the renderer once and every binary updates.
Tools discipline rule (2026-05-26). The cli's prefix carries a
[tool-discipline] paragraph stating the closed-set rule and
pointing at the server's AVAILABLE TOOLS block as the authoritative
catalogue. It deliberately does NOT re-enumerate the tools — the
server's own system prompt (rendered by
easyai::preamble::build_session_info(tools) server-side) already
lists them, and duplicating that catalogue in the cli prefix would
waste tokens and risk drift.
Entries whose keywords resolve to a fix- prefix are immutable: save /
append / delete refuse them. Pass fix=true (knowledge_save) to
mint one.
See RAG.md for the full guide, including §5 "Automatic
vocabulary injection".
--external-tools <dir> loads every EASYAI-<name>.tools JSON manifest
in <dir> as an operator-defined tool pack. Per-file fault isolation —
a broken manifest doesn't take down the others. Tools spawn via
fork+execve, never a shell, so a manifest is the supported way to
give the model focused powers without flipping --allow-bash.
See EXTERNAL_TOOLS.md for the manifest schema and
worked examples.
Each one hits a known endpoint, prints the result, and exits. They're
mutually exclusive with chat; if you pass any of them with -p or a
positional prompt, the chat is dropped and only the management call
runs.
| Flag | What it does |
|---|---|
--list-tools |
Print every LOCAL tool (the catalog the CLI sends to the server in tools[]) with name + full description. The fastest way to confirm what the model will see. |
--list-remote-tools |
GET /v1/tools. easyai-server extension — lists tools the server registered (its built-ins + the knowledge_* tools + external + MCP-fetched). May 404 against other OpenAI-compat servers. |
--list-models |
GET /v1/models. Standard. |
--health |
GET /health. Prints ok / unhealthy: <reason>. |
--props |
GET /props. Server-side configuration dump. |
--metrics |
GET /metrics. Prometheus exposition. |
--set-preset NAME |
POST /v1/preset {preset:NAME}. Switches the server's ambient sampling preset (easyai-server extension). |
--show-system-prompt |
Resolve and print the system prompt the CLI would send on the next turn — built-in [environment] + [guidance] injection plus any --system / --system-file content. Does NOT contact the server, so it works without a reachable --url. The fastest way to verify "is the model actually seeing my persona / sandbox / guidance?". |
The connection flags (--url, --api-key, --insecure-tls,
--ca-cert) apply to management subcommands the same way they apply to
chat. --show-system-prompt is the one exception — it never makes a
network call and works without --url.
easyai-cli --url http://ai.local:8080 -p "what's the capital of Mongolia?"easyai-cli --url http://ai.local:8080 \
--allow-bash --sandbox ~/projects/tetris \
"implement a tetris in C++ with SOLID design, write tests, and document"What this gives the model:
bashrooted at~/projects/tetris- the unified
fstool (action=read / write / list / glob / grep / check_path / cwd / sandbox), all rooted there too fs(action="sandbox")returning~/projects/tetrisplanfor a visible step checklist[environment]block with the resolved absolute path[guidance]block with the assertiveness rule
easyai-cli --url http://ai.local:8080 -p "summarise transformers in 5 lines"No sandbox, no --allow-bash → the model has only datetime, plan,
web, and system_*. No [environment] / [guidance] injection
because there's no file / shell affordance.
easyai-cli --url http://ai.local:8080 --tools datetime,web \
"find the latest CVE for libcurl"--tools overrides the auto-catalog completely. --allow-bash /
--sandbox / --memory are still respected for their specific tools but
the rest of the catalog is whatever's in the explicit list.
easyai-cli --sandbox /tmp/foo --allow-bash --show-system-promptPrints exactly what the model would receive on the next turn,
including the resolved absolute path in [environment] and the
[guidance] block. Doesn't contact the server — works without
--url. Use this whenever you tweak --system / --system-file /
--sandbox / --allow-bash and want to confirm the result before
the chat starts.
Equivalent --system overlay:
easyai-cli --sandbox /tmp/foo --allow-bash \
--system "You are a senior C++ engineer." \
--show-system-promptOutput: [environment] block, [guidance] block, blank line, then
your You are a senior C++ engineer. — same order the model sees them
in the next request.
cat README.md | easyai-cli --url http://ai.local:8080 \
-p "summarise this in 3 bullets"Stdin overrides any positional prompt and is appended to the prompt text.
ANSWER=$(easyai-cli --url $URL --quiet -p "is 17 prime? answer y/n only")
[ "$ANSWER" = "y" ] && echo "prime"--quiet drops the spinner so stdout is clean.
easyai-cli --url $URL --set-preset deterministic
easyai-cli --url $URL --set-preset balancedAffects every subsequent request to the server until changed again. Server-side feature.
easyai-cli --url https://api.openai.com --api-key $OPENAI_API_KEY \
--model gpt-4o-mini -p "hi"Works against any OpenAI-compat endpoint; reasoning streams pass through cleanly for models that emit them.
README.md— sales overview + quickstart for the whole project.easyai-server.md— the matching server: tool gating, MCP surface, INI config, the Deep persona.manual.md— embeddingeasyai::Clientin your own binaries, authoring tools, the agentic-loop walkthrough.design.md— architecture and "why" decisions.AI_TOOLS.md— what a tool is, JSON-schema, the loop.EXTERNAL_TOOLS.md— operator-defined external tools (EASYAI-*.toolsmanifests).RAG.md— persistent registry / long-term memory.MCP.md— Model Context Protocol surface.