A prompt optimizer that cuts token usage while maintaining quality.
- Tree-based Segmentation: Recursively splits prompts into segments for fine-grained optimization
- Cut-then-Rewrite Strategy: Attempts to remove redundant content, then rewrites if cutting fails
- Quality-Aware Compression: Maintains quality thresholds during compression
- Multi-Candidate Generation: Generates multiple compression variants and chooses the best
- DSPy Integration: First-class support for DSPy programs via the DSPy adapter
pip install cutiaThe DSPy adapter allows you to compress DSPy programs:
import dspy
from cutia.adapters.dspy_adapter import CUTIA
# Configure models
# prompt_model generates rewrite candidates
prompt_model = dspy.LM(
model="openai/gpt-4o-mini",
max_tokens=10000,
temperature=1,
)
# task_model runs the task/program for scoring and validation
task_model = dspy.LM(
model="openai/gpt-4.1-nano",
max_tokens=2000,
temperature=1,
)
# Define your metric
def your_metric(example, prediction, trace=None):
return example.output == prediction.output
# Create optimizer
optimizer = CUTIA(
prompt_model=prompt_model,
task_model=task_model,
metric=your_metric,
)
# Compile your program
compressed_program = optimizer.compile(
student=your_program,
trainset=train_examples,
valset=val_examples,
)If you’re running CUTIA (or other prompt optimizers) against locally hosted LLMs, vLLM is a solid option for serving models: it supports high-throughput inference and handles concurrent requests efficiently.
vLLM
If you’d like to use a separate prompt model from the task model, llmsnap can help by enabling fast model switching via vLLM’s sleep/wake mode—so you can swap models in seconds.
llmsnap
- Tree Building: The prompt is recursively split into segments (left, chunk, right)
- Node Processing: For each node in the tree:
- Attempt to cut the chunk entirely
- If cutting fails quality check, attempt to rewrite the chunk
- Keep original if both fail
- Multi-Candidate: Generate multiple compression variants with different random seeds
- Selection: Evaluate candidates on validation set and select the best
Demonstrates prompt compression on a character counting task using the CharBench dataset.
See src/cutia/examples/README.md for details.
For development with testing and linting tools:
# Clone the repository
git clone https://github.com/napmany/cutia.git
cd cutia
# Install with development dependencies
uv sync --extra dev# Install development dependencies (if not already installed)
uv sync --extra dev
# Run tests
make test
The project uses Ruff for linting and formatting, and Pyright for type checking:
# Run all checks (linting, formatting, and type checking)
make check- No required dependencies for the base library
Install optional dependencies:
# For testing
uv sync --extra test
# For development (includes test dependencies)
uv sync --extra dev- Framework-agnostic core implementation (not tied to DSPy)
- Additional adapters for other frameworks and platforms (LangChain, MLflow, etc.)
- Standalone Python API for direct use
- Enhanced chunking strategies
Note
⭐️ Star this project to help others discover it!