LLMSim

LLM Traffic Simulator - A lightweight, high-performance LLM API simulator for load testing, CI/CD, and local development.

Overview

LLMSim replicates realistic LLM API behavior without running actual models. It solves common challenges when testing LLM-integrated applications:

Cost: Real API calls during load tests are expensive
Rate Limits: Production APIs prevent realistic load testing
Reproducibility: Real models produce variable responses
Traffic Realism: LLM responses have unique characteristics (streaming, variable latency, token-based billing)

Features

Multi-Provider API Support - OpenAI Chat Completions and OpenResponses APIs
Realistic Latency Simulation - Time-to-first-token (TTFT) and inter-token delays with normal distribution
Streaming Support - Server-Sent Events (SSE) for both OpenAI and OpenResponses streaming formats
Accurate Token Counting - Uses tiktoken-rs (OpenAI's tokenizer implementation)
Error Injection - Rate limits (429), server errors (500/503), timeouts
Multiple Response Generators - Lorem ipsum, echo, fixed, random, sequence
Model-Specific Profiles - GPT-5, GPT-4, Claude, Gemini latency profiles
Real-time Stats Dashboard - TUI dashboard with live metrics (requests, tokens, latency, errors)
Stats API - JSON endpoint for programmatic access to server metrics

Installation

cargo install llmsim

# Include the optional terminal dashboard
cargo install llmsim --features tui

Demo

Usage

CLI Server

# Start with defaults (port 8080, lorem generator)
llmsim serve

# Start with real-time stats dashboard (TUI)
# Requires installing/building with `--features tui`
llmsim serve --tui

# All options
llmsim serve \
  --port 8080 \
  --host 0.0.0.0 \
  --generator lorem \
  --target-tokens 150 \
  --tui

# Using config file
llmsim serve --config config.toml

Stats Dashboard

The --tui flag launches an interactive terminal dashboard showing real-time metrics:

Requests: Total, active, streaming vs non-streaming, requests/sec
Tokens: Prompt, completion, total, tokens/sec
Latency: Average, min, max response times
Errors: Total errors, rate limits (429), server errors (5xx), timeouts
Charts: RPS and token rate sparklines, model distribution

Controls: q to quit, r to force refresh.

As a Library

use llmsim::{
    openai::{ChatCompletionRequest, Message},
    generator::LoremGenerator,
    latency::LatencyProfile,
};

// Create a latency profile
let latency = LatencyProfile::gpt5();

// Count tokens
let tokens = llmsim::count_tokens("Hello, world!", "gpt-5").unwrap();

// Generate responses
let generator = LoremGenerator::new(100);
let response = generator.generate(&request);

API Endpoints

OpenAI API (`/openai/v1/...`)

Endpoint	Method	Description
`/openai/v1/chat/completions`	POST	Chat completions (streaming & non-streaming)
`/openai/v1/models`	GET	List available models
`/openai/v1/models/{model_id}`	GET	Get specific model details
`/openai/v1/responses`	POST	Responses API (streaming & non-streaming)

When using OpenAI SDKs, set the base URL to http://localhost:8080/openai/v1.

OpenResponses API (`/openresponses/v1/...`)

OpenResponses is an open-source specification for building multi-provider, interoperable LLM interfaces.

Endpoint	Method	Description
`/openresponses/v1/responses`	POST	Create response (streaming & non-streaming)

LLMSim endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/llmsim/stats`	GET	Real-time server statistics (JSON)

Configuration

TOML Config File

[server]
port = 8080
host = "0.0.0.0"

[latency]
profile = "gpt5"
# Custom values (optional):
# ttft_mean_ms = 600
# ttft_stddev_ms = 150
# tbt_mean_ms = 40
# tbt_stddev_ms = 12

[response]
generator = "lorem"
target_tokens = 100

[errors]
rate_limit_rate = 0.01
server_error_rate = 0.001
timeout_rate = 0.0
timeout_after_ms = 30000

[models]
available = [
  "gpt-5",
  "gpt-5-mini",
  "gpt-4o",
  "claude-opus",
]

Note: The config file format moved from YAML to TOML in this release. To migrate an existing config.yaml, replace section headers like server: with [server], change key: value to key = value, quote strings, and convert lists. See benchmarks/config/*.toml for working examples.

Supported Models

Family	Models
GPT-5	gpt-5, gpt-5-pro, gpt-5-mini, gpt-5-nano, gpt-5-codex, gpt-5.1, gpt-5.2, gpt-5.3-codex, gpt-5.4, gpt-5.5
O-Series	o1, o1-mini, o3, o3-mini, o4-mini
GPT-4	gpt-4, gpt-4-turbo, gpt-4o, gpt-4o-mini, gpt-4.1
Claude	claude-opus, claude-sonnet, claude-haiku (with 4.x versions through Opus 4.7 and Sonnet 4.6)
Gemini	gemini-2.0-flash, gemini-2.5-pro, gemini-3 and gemini-3.1 previews

Latency Profiles

Profile	TTFT Mean	TBT Mean
gpt-5	600ms	40ms
gpt-5-mini	300ms	20ms
gpt-4	800ms	50ms
gpt-4o	400ms	25ms
o-series	2000ms	30ms
claude-opus	1000ms	60ms
claude-sonnet	500ms	30ms
claude-haiku	200ms	15ms
instant	0ms	0ms
fast	10ms	1ms

Use Cases

Load Testing - Simulate thousands of concurrent LLM requests
CI/CD Pipelines - Fast, deterministic tests for LLM integrations
Local Development - Develop without API keys or costs
Chaos Engineering - Test behavior under failure scenarios
Cost Estimation - Estimate token usage before production

Requirements

Rust 1.83+ (for building from source)
OR Docker

License

MIT License - see LICENSE for details.

Contributing

See CONTRIBUTING.md for contribution guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.claude		.claude
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
scripts/lib		scripts/lib
specs		specs
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
deny.toml		deny.toml
docker-compose.yml		docker-compose.yml
justfile		justfile
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMSim

Overview

Features

Installation

Demo

Usage

CLI Server

Stats Dashboard

As a Library

API Endpoints

OpenAI API (`/openai/v1/...`)

OpenResponses API (`/openresponses/v1/...`)

LLMSim endpoints

Configuration

TOML Config File

Supported Models

Latency Profiles

Use Cases

Requirements

License

Contributing

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLMSim

Overview

Features

Installation

Demo

Usage

CLI Server

Stats Dashboard

As a Library

API Endpoints

OpenAI API (/openai/v1/...)

OpenResponses API (/openresponses/v1/...)

LLMSim endpoints

Configuration

TOML Config File

Supported Models

Latency Profiles

Use Cases

Requirements

License

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

OpenAI API (`/openai/v1/...`)

OpenResponses API (`/openresponses/v1/...`)

Packages