chat-rs

A multi-provider LLM framework for Rust. Build type-safe chat clients with tool calling, structured output, streaming, and embeddings — swap providers with a single line change.

Features

Multi-provider — Gemini, Claude, OpenAI, DeepSeek, Ollama, Hugging Face, Cerebras, mistral.rs (local), generic OpenAI-compatible servers, generic Responses API servers, and Router today, more coming (see Roadmap)
Router — route requests across multiple providers with fallback and custom strategies (keyword, embedding, capability-based)
Type-safe builder — compile-time enforcement of valid configurations via type-state pattern
Tool calling — define tools with #[tool] in Rust, or load @tool-decorated Python scripts at runtime; the framework handles the call loop automatically
Structured output — deserialize model responses directly into your Rust types via schemars
Streaming — real-time token-by-token output with tool call support
Human in the loop — pause mid-turn on sensitive tool calls, let a human approve or reject, then resume the stream
Embeddings — generate vector embeddings through the same unified API
Retry & callbacks — configurable retry strategies with before/after hooks
Native tools — provider-specific features like Google Search, code execution, web search

Quick Start

Add to your Cargo.toml:

[dependencies]
chat-rs = { version = "0.5.0", features = ["openai"] }
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }

use chat_rs::{ChatBuilder, openai::OpenAIBuilder, types::messages};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
    let client = OpenAIBuilder::new().with_model("gpt-4o-mini").build();
    let mut chat = ChatBuilder::new().with_model(client).build();

    let mut messages = messages::from_user(vec!["Hey there!"]);
    let res = chat.complete(&mut messages).await?;
    println!("{:?}", res.content);

    Ok(())
}

Set your API key via environment variable (OPENAI_API_KEY, GEMINI_API_KEY, or CLAUDE_API_KEY), or pass it explicitly with .with_api_key().

Providers

Enable providers via feature flags:

# Pick one or more
chat-rs = { version = "0.5.0", features = ["gemini"] }
chat-rs = { version = "0.5.0", features = ["claude"] }
chat-rs = { version = "0.5.0", features = ["openai"] }
chat-rs = { version = "0.5.0", features = ["ollama"] }
chat-rs = { version = "0.5.0", features = ["huggingface"] }
chat-rs = { version = "0.5.0", features = ["cerebras"] }
chat-rs = { version = "0.5.0", features = ["completions"] }
chat-rs = { version = "0.5.0", features = ["router", "gemini", "claude"] }
chat-rs = { version = "0.5.0", features = ["gemini", "claude", "openai", "stream"] }

Provider	Feature	API Key Env Var	Builder
Google Gemini	`gemini`	`GEMINI_API_KEY`	`GeminiBuilder`
Anthropic Claude	`claude`	`CLAUDE_API_KEY`	`ClaudeBuilder`
OpenAI	`openai`	`OPENAI_API_KEY`	`OpenAIBuilder`
DeepSeek	`deepseek`	`DEEPSEEK_API_KEY`	`DeepSeekBuilder`
Ollama (local)	`ollama`	— (optional)	`OllamaBuilder`
Hugging Face Router	`huggingface`	`HF_TOKEN`	`HuggingFaceBuilder`
Cerebras	`cerebras`	`CEREBRAS_API_KEY`	`CerebrasBuilder`
OpenRouter	`openrouter`	`OPENROUTER_API_KEY`	`OpenRouterBuilder`
mistral.rs (local in-process)	`mistralrs`	—	`MistralRsBuilder`
Generic Chat Completions	`completions`	depends on server	`ChatCompletionsBuilder`
Generic Responses API	`responses`	depends on server	`ResponsesBuilder`
Router	`router`	—	`RouterBuilder`

The ollama, huggingface, cerebras, deepseek, and completions providers all share the same Chat Completions wire spec, factored into the chat-completions crate. The openai provider is a thin wrapper over chat-responses (the Responses API wire crate). Bring-your-own server: use ChatCompletionsBuilder for /v1/chat/completions servers (vLLM, llama.cpp, LiteLLM, etc.) or ResponsesBuilder for /responses servers.

For fully local in-process inference (no HTTP, no daemon), use the mistralrs provider — weights load into your process via mistral.rs.

Swapping providers is a one-line change — replace the builder, everything else stays the same:

// Gemini
let client = GeminiBuilder::new()
    .with_model("gemini-2.5-flash".to_string())
    .build();

// Claude
let client = ClaudeBuilder::new()
    .with_model("claude-sonnet-4-20250514".to_string())
    .build();

// OpenAI
let client = OpenAIBuilder::new()
    .with_model("gpt-4o")
    .build();

// Ollama (local) — pulls the model if missing, then builds
let client = OllamaBuilder::new()
    .with_model("llama3.2")
    .pull().await?
    .build();

// Hugging Face Inference Providers
let client = HuggingFaceBuilder::new()
    .with_model("openai/gpt-oss-120b:fastest")
    .build();

// Cerebras
let client = CerebrasBuilder::new()
    .with_model("llama-3.3-70b")
    .build();

// DeepSeek
let client = DeepSeekBuilder::new()
    .with_model("deepseek-v4-pro")
    .build();

// mistral.rs (local, in-process — no HTTP)
let client = MistralRsBuilder::new()
    .with_model("Qwen/Qwen2.5-3B-Instruct-GGUF")
    .with_gguf_file("qwen2.5-3b-instruct-q4_k_m.gguf")
    .build().await?;

// Bring-your-own Chat Completions server (vLLM, llama.cpp, LiteLLM, ...)
let client = ChatCompletionsBuilder::new()
    .with_base_url("http://localhost:8000/v1")
    .with_model("my-model")
    .with_api_key("sk-...")
    .build();

// Bring-your-own Responses API server
let client = ResponsesBuilder::new()
    .with_base_url("https://your-gateway/v1")
    .with_model("my-model")
    .with_api_key("sk-...")
    .build();

// Same from here on
let mut chat = ChatBuilder::new().with_model(client).build();

Tool Calling

Define tools with the #[tool] macro from tools-rs and register them with collect_tools(). The framework automatically loops through tool calls until the model is done.

use chat_rs::{ChatBuilder, gemini::GeminiBuilder, types::messages::content};
use tools_rs::{collect_tools, tool};

#[tool]
/// Looks up the current weather for a given city.
async fn get_weather(city: String) -> String {
    format!("The weather in {} is sunny, 22°C", city)
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = GeminiBuilder::new()
        .with_model("gemini-2.5-flash".to_string())
        .build();

    let tools = collect_tools();

    let mut chat = ChatBuilder::new()
        .with_tools(tools)
        .with_model(client)
        .with_max_steps(5)
        .build();

    let mut messages = messages::Messages::default();
    messages.push(content::from_user(vec!["What's the weather in Tokyo?"]));

    let response = chat.complete(&mut messages).await.map_err(|e| e.err)?;
    println!("{:?}", response.content);

    Ok(())
}

Python Tools

Load tools from Python scripts at runtime via the python feature (powered by tools-rs 0.3 + PyO3). Decorate functions with @tool() and point ToolsBuilder at a directory of .py files — they register alongside any native #[tool]s.

chat-rs = { version = "0.5.0", features = ["gemini", "python"] }

# scripts/weather.py
from tools_rs import tool

@tool()
def get_weather(city: str) -> str:
    """Get the current weather in a city.

    Args:
        city: The city to look up.
    """
    return {"London": "rainy, 12C", "Tokyo": "sunny, 22C"}.get(city, "unknown")

use tools_rs::{Language, ToolsBuilder};

let tools = ToolsBuilder::new()
    .with_language(Language::Python)
    .from_path("scripts")
    .collect()?;

let mut chat = ChatBuilder::new()
    .with_tools(tools)
    .with_model(client)
    .build();

PyO3 builds against the system Python; if your interpreter is newer than PyO3's max supported version, set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 when building.

Structured Output

Deserialize model responses directly into typed Rust structs. Your type must derive JsonSchema and Deserialize.

use schemars::JsonSchema;
use serde::Deserialize;

#[derive(JsonSchema, Deserialize, Clone, Debug)]
struct User {
    pub name: String,
    pub likes: Vec<String>,
}

let mut chat = ChatBuilder::new()
    .with_structured_output::<User>()
    .with_model(client)
    .build();

let response = chat.complete(&mut messages).await?;
println!("Name: {}, Likes: {:?}", response.content.name, response.content.likes);

Streaming

Enable the stream feature flag:

chat-rs = { version = "0.5.0", features = ["gemini", "stream"] }

use chat_rs::StreamEvent;
use futures::StreamExt;

let mut chat = ChatBuilder::new()
    .with_model(client)
    .build();

let mut stream = chat.stream(&mut messages).await?;

while let Some(chunk) = stream.next().await {
    match chunk? {
        StreamEvent::TextChunk(text) => print!("{}", text),
        StreamEvent::ReasoningChunk(thought) => print!("[thinking] {}", thought),
        StreamEvent::ToolCall(fc) => println!("[calling {}]", fc.name),
        StreamEvent::ToolResult(fr) => println!("[tool returned]"),
        StreamEvent::Structured(value) => println!("[structured] {value}"),
        StreamEvent::Done(_) => break,
    }
}

StreamEvent::Structured(Value) is the streaming counterpart to with_structured_output::<T>() — providers can yield complete typed objects mid-stream (each event is a full serde_json::Value, not a fragment). The engine accumulates them into the final ChatResponse.content.parts so non-streaming consumers see them as PartEnum::Structured entries.

Input Streaming (bidirectional)

Push input into the chat while the model is producing output — typed text, audio chunks (as PartEnum::File), tool results, anything that fits a PartEnum. Useful for robotics, voice assistants, or any consumer where new context arrives during generation.

Transition the builder into InputStreamed via .with_input_stream(). chat.stream(&mut messages) then returns a ChatStream: it is the output stream you iterate with .next(), and it carries an input side you push to with .send(). .send() is the inverse of .next() — one verb for every input. Pushed input merges into Messages (coalescing into the trailing user turn), drops the current provider stream, and re-enters with the updated state. For HTTP/SSE providers this is interrupt-and-restart; native-WS providers (planned OpenAI Realtime, Gemini Live) can hold their session open — the surface is identical either way.

Completed work is never lost on an interrupt: every finished tool call and result stays in Messages and is re-sent; only the in-flight partial generation is discarded (tools execute between steps, never mid-stream, so an interrupt can't sever a running tool).

use chat_rs::{ChatBuilder, StreamEvent, openai::OpenAIBuilder, types::messages};
use futures::StreamExt;

let client = OpenAIBuilder::new().with_model("gpt-4o").build();

let mut chat = ChatBuilder::new()
    .with_model(client)
    .with_input_stream()
    .build();

let mut messages = messages::from_user(vec!["Tell me a long story about a rust crab."]);

let mut stream = chat.stream(&mut messages).await?;

// A producer handle — Clone + Send + 'static, so it drops into a task.
// Interrupt mid-generation with a follow-up:
let input = stream.input();
tokio::spawn(async move {
    tokio::time::sleep(std::time::Duration::from_secs(2)).await;
    let _ = input.send("Wait — make the crab wear a hat.");
});

while let Some(event) = stream.next().await {
    if let StreamEvent::TextChunk(t) = event? { print!("{t}"); }
}

For concurrent input and output, split() the stream into independent (InputStream, OutputStream) halves; cancel() tears the exchange down. See examples/openai/input_stream.rs for the minimal form and examples/openai/interactive.rs for an interactive CLI with mid-reply barge-in.

Human in the Loop

Mark tools that need human approval via #[tool] metadata and supply a strategy closure. When the model calls such a tool, chat.stream() yields StreamEvent::Paused(PauseReason) and terminates. Resolve the pending tools on messages (approve or reject), then call stream() again — the core loop picks up where it left off.

use chat_rs::{Action, ChatBuilder, ScopedCollection, StreamEvent, PauseReason};
use tools_rs::{FunctionCall, ToolCollection, tool};
use serde::Deserialize;

#[derive(Debug, Default, Clone, Deserialize)]
#[serde(default)]
struct ApprovalMeta { requires_approval: bool }

#[tool(requires_approval = true)]
/// Sends an email.
async fn send_email(to: String, subject: String) -> String {
    format!("sent to {to}: {subject}")
}

fn strategy(_call: &FunctionCall, meta: &ApprovalMeta) -> Action {
    if meta.requires_approval { Action::RequireApproval } else { Action::Execute }
}

let tools: ToolCollection<ApprovalMeta> = ToolCollection::collect_tools()?;
let scoped = ScopedCollection::new(tools, strategy);

let mut chat = ChatBuilder::new()
    .with_model(client)
    .with_scoped_tools(scoped)
    .build();

let mut stream = chat.stream(&mut messages).await?;
while let Some(evt) = stream.next().await {
    match evt? {
        StreamEvent::TextChunk(t) => print!("{t}"),
        StreamEvent::Paused(PauseReason::AwaitingApproval { tool_ids }) => {
            for id in tool_ids {
                if let Some(tool) = messages.find_tool_mut(&id) {
                    tool.approve(None); // or tool.reject(Some("denied".into()))
                }
            }
            break;
        }
        _ => {}
    }
}
// Call chat.stream(&mut messages) again to resume the same turn.

See examples/claude/hitl.rs, examples/openai/hitl.rs, and examples/gemini/hitl.rs for full interactive REPLs.

Embeddings

let client = GeminiBuilder::new()
    .with_model("gemini-embedding-001".to_string())
    .with_embeddings(Some(768))
    .build();

let mut chat = ChatBuilder::new()
    .with_model(client)
    .with_embeddings()
    .build();

let response = chat.embed(&mut messages).await?;
println!("{:?}", response.embeddings);

Native Tools

Provider-specific capabilities beyond standard tool calling:

// Gemini: Google Search, Code Execution, Google Maps
let client = GeminiBuilder::new()
    .with_model("gemini-2.5-flash".to_string())
    .with_google_search()
    .with_code_execution()
    .build();

// OpenAI: Web Search
let client = OpenAIBuilder::new()
    .with_model("gpt-4o")
    .with_web_search(Some(SearchContextSizeEnum::High), None)
    .build();

// OpenAI: Image Generation
let client = OpenAIBuilder::new()
    .with_model("gpt-4o")
    .with_image_generation(ImageGenerationTool::default())
    .build();

OpenAI-Compatible Endpoints

For any server speaking the OpenAI Chat Completions wire spec (vLLM, llama.cpp's llama-server, LiteLLM, etc.), use ChatCompletionsBuilder directly:

use chat_rs::completions::ChatCompletionsBuilder;

let client = ChatCompletionsBuilder::new()
    .with_base_url("http://localhost:8000/v1")
    .with_model("my-model")
    .with_api_key("sk-...")  // optional — omit for servers that don't require auth
    .build();

Dedicated wrappers preset URL/env-var/auth for popular targets:

Ollama — OllamaBuilder defaults to http://localhost:11434/v1, honors OLLAMA_HOST, supports .pull() to fetch a model via Ollama's native API.
Hugging Face Router — HuggingFaceBuilder defaults to https://router.huggingface.co/v1, reads HF_TOKEN.
Cerebras — CerebrasBuilder defaults to https://api.cerebras.ai/v1, reads CEREBRAS_API_KEY.
DeepSeek — DeepSeekBuilder defaults to https://api.deepseek.com/v1, reads DEEPSEEK_API_KEY.

For endpoints implementing the OpenAI Responses API (POST /responses, a different wire format from Chat Completions), use ResponsesBuilder from the chat-responses crate, or OpenAIBuilder::with_custom_url() if you want to keep the OpenAI-specific defaults and native tools.

Router

Route requests across multiple providers with automatic fallback on retryable errors. Add a custom RoutingStrategy to control provider selection based on keywords, embeddings, capabilities, or any logic you need.

use chat_rs::{
    ChatBuilder,
    router::RouterBuilder,
    gemini::GeminiBuilder,
    claude::ClaudeBuilder,
    types::messages,
};

let gemini = GeminiBuilder::new()
    .with_model("gemini-2.5-flash".to_string())
    .build();

let claude = ClaudeBuilder::new()
    .with_model("claude-sonnet-4-20250514".to_string())
    .build();

let router = RouterBuilder::new()
    .add_provider(gemini)
    .add_provider(claude)
    // .with_strategy(my_strategy)  // optional custom routing
    // .circuit_breaker(CircuitBreakerConfig::default())  // optional circuit breaker
    .build();

let mut chat = ChatBuilder::new().with_model(router).build();

let mut msgs = messages::from_user(vec!["Hello!"]);
let res = chat.complete(&mut msgs).await?;

Without a custom strategy, the router tries providers in order and falls back on retryable errors (rate limits, network issues). Non-retryable errors are returned immediately.

Enable the optional circuit breaker to automatically skip providers that have failed repeatedly, and probe them again after a configurable recovery timeout:

use chat_rs::router::CircuitBreakerConfig;

let router = RouterBuilder::new()
    .add_provider(gemini)
    .add_provider(claude)
    .circuit_breaker(CircuitBreakerConfig {
        failure_threshold: 3,
        recovery_timeout: std::time::Duration::from_secs(30),
    })
    .build();

Streaming is also supported via StreamRouterBuilder — enable the stream feature flag and use providers that implement ChatProvider.

Transport Layer

Providers are generic over a pluggable Transport trait. The default transport is ReqwestTransport (HTTP via reqwest) — it's used automatically when you call .build() on any builder.

To share an HTTP client across providers:

use chat_rs::openai::{OpenAIBuilder, ReqwestTransport};

let http = ReqwestTransport::from(my_reqwest_client);
let client = OpenAIBuilder::new()
    .with_model("gpt-4o")
    .with_transport(http.clone()) // Clone shares the connection pool
    .build();

To use WebSocket transport (e.g. for OpenAI's Responses API over WS):

chat-rs = { version = "0.5.0", features = ["openai", "stream", "tokio-tungstenite"] }

use chat_rs::{openai::OpenAIBuilder, transport::AsyncWsTransport};

let ws = AsyncWsTransport::new()
    .with_message_type("response.create"); // OpenAI WS envelope

let client = OpenAIBuilder::new()
    .with_model("gpt-4o")
    .with_transport(ws)
    .build();

Two WebSocket transports are available, feature-gated:

Transport	Feature	Crate	Notes
`AsyncWsTransport`	`tokio-tungstenite`	tokio-tungstenite	Fully async, recommended with tokio
`WsTransport`	`tungstenite`	tungstenite	Sync WS bridged via `spawn_blocking`

To use a fully custom transport (tower, hyper, WASM, etc.):

use chat_rs::Transport;

struct MyTransport { /* ... */ }
impl Transport for MyTransport { /* ... */ }

let client = OpenAIBuilder::new()
    .with_model("gpt-4o")
    .with_transport(MyTransport::new())
    .build();

Transport implementations live in core/src/transport/impls/. See core/AGENTS.md for the Transport trait definition.

Architecture

chat-rs (root)              ← Re-exports + feature flags
├── core/                   ← Traits, types, Chat engine, builder, Transport trait + impls
├── providers/
│   ├── completions/        ← Generic OpenAI Chat Completions wire (`ChatCompletionsBuilder`)
│   ├── responses/          ← Generic OpenAI Responses API wire (`ResponsesBuilder`)
│   ├── gemini/             ← Google Gemini provider
│   ├── claude/             ← Anthropic Claude provider
│   ├── openai/             ← OpenAI (thin wrapper over `chat-responses` + embeddings + native tools)
│   ├── ollama/             ← Ollama wrapper (local daemon, pull/ping)
│   ├── huggingface/        ← Hugging Face Inference Providers (Router)
│   ├── cerebras/           ← Cerebras Inference
│   ├── deepseek/           ← DeepSeek
│   ├── mistralrs/          ← Local in-process inference (mistral.rs)
│   └── router/             ← Multi-provider router
└── examples/
    ├── completions/        ← Generic OAI-compat examples
    ├── gemini/             ← Gemini examples
    ├── claude/             ← Claude examples
    ├── openai/             ← OpenAI examples
    ├── ollama/             ← Ollama examples
    ├── huggingface/        ← Hugging Face examples
    ├── cerebras/           ← Cerebras examples
    ├── deepseek/           ← DeepSeek examples
    ├── openrouter/         ← OpenRouter examples
    ├── mistralrs/          ← mistral.rs (local) examples
    └── router/             ← Router strategy examples

See core/AGENTS.md and providers/AGENTS.md for detailed architecture documentation.

Examples

Run examples with the appropriate feature flags:

# Gemini
cargo run --example gemini-tools --features gemini
cargo run --example gemini-structured --features gemini
cargo run --example gemini-stream --features gemini,stream
cargo run --example gemini-embeddings --features gemini
cargo run --example gemini-code-execution --features gemini
cargo run --example gemini-google-maps --features gemini
cargo run --example gemini-image-understanding --features gemini
cargo run --example gemini-hitl --features gemini,stream
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 cargo run --example gemini-python-tools --features gemini,python

# Claude
cargo run --example claude-completion --features claude
cargo run --example claude-stream --features claude,stream
cargo run --example claude-hitl --features claude,stream

# OpenAI
cargo run --example openai-completion --features openai
cargo run --example openai-stream --features openai,stream
cargo run --example openai-structured --features openai
cargo run --example openai-embeddings --features openai
cargo run --example openai-hitl --features openai,stream
cargo run --example openai-websocket --features openai,stream,tokio-tungstenite

# Router
cargo run --example router-keyword --features router,gemini,claude
cargo run --example router-embeddings --features router,gemini,claude
cargo run --example router-capability --features router,gemini,claude
cargo run --example router-stream --features router,gemini,claude,stream

# Ollama (local)
cargo run --example ollama-completion --features ollama
cargo run --example ollama-stream --features ollama,stream
cargo run --example ollama-tools --features ollama
cargo run --example ollama-structured --features ollama
cargo run --example ollama-embeddings --features ollama
cargo run --example ollama-pull --features ollama

# Hugging Face
cargo run --example huggingface-completion --features huggingface
cargo run --example huggingface-stream --features huggingface,stream

# Cerebras
cargo run --example cerebras-completion --features cerebras
cargo run --example cerebras-stream --features cerebras,stream

# DeepSeek
cargo run --example deepseek-completion --features deepseek
cargo run --example deepseek-stream --features deepseek,stream

# OpenRouter
cargo run --example openrouter-completion --features openrouter
cargo run --example openrouter-stream --features openrouter,stream

# mistral.rs (local in-process)
cargo run --example mistralrs-completion --features mistralrs
cargo run --example mistralrs-stream --features mistralrs,stream
cargo run --example mistralrs-vision --features mistralrs,stream
cargo run --example mistralrs-voice --features mistralrs,stream

# Generic OpenAI-compatible server
cargo run --example completions-completion --features completions

# Retry strategies
cargo run --example retry --features gemini

Minimum Supported Rust Version

Rust 1.94 or later (edition 2024).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 352 Commits
.github/workflows		.github/workflows
core		core
docs		docs
examples		examples
providers		providers
src		src
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
ROADMAP.md		ROADMAP.md
build.rs		build.rs
rust-analyzer.toml		rust-analyzer.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chat-rs

Features

Quick Start

Providers

Tool Calling

Python Tools

Structured Output

Streaming

Input Streaming (bidirectional)

Human in the Loop

Embeddings

Native Tools

OpenAI-Compatible Endpoints

Router

Transport Layer

Architecture

Examples

Minimum Supported Rust Version

License

About

Releases

Packages

Used by

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

chat-rs

Features

Quick Start

Providers

Tool Calling

Python Tools

Structured Output

Streaming

Input Streaming (bidirectional)

Human in the Loop

Embeddings

Native Tools

OpenAI-Compatible Endpoints

Router

Transport Layer

Architecture

Examples

Minimum Supported Rust Version

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages

Used by

Contributors

Languages