CUTIA: Quality-Aware Prompt Compressor

CUTIA: Quality-Aware Prompt Compressor

A prompt optimizer that cuts token usage while maintaining quality.

Features

Tree-based Segmentation: Recursively splits prompts into segments for fine-grained optimization
Cut-then-Rewrite Strategy: Attempts to remove redundant content, then rewrites if cutting fails
Quality-Aware Compression: Maintains quality thresholds during compression
Multi-Candidate Generation: Generates multiple compression variants and chooses the best
DSPy Integration: First-class support for DSPy programs via the DSPy adapter

Installation

pip install cutia

Usage

DSPy Adapter

The DSPy adapter allows you to compress DSPy programs:

import dspy
from cutia.adapters.dspy_adapter import CUTIA

# Configure models
# prompt_model generates rewrite candidates
prompt_model = dspy.LM(
    model="openai/gpt-4o-mini",
    max_tokens=10000,
    temperature=1,
)
# task_model runs the task/program for scoring and validation
task_model = dspy.LM(
    model="openai/gpt-4.1-nano",
    max_tokens=2000,
    temperature=1,
)

# Define your metric
def your_metric(example, prediction, trace=None):
    return example.output == prediction.output

# Create optimizer
optimizer = CUTIA(
    prompt_model=prompt_model,
    task_model=task_model,
    metric=your_metric,
)

# Compile your program
compressed_program = optimizer.compile(
    student=your_program,
    trainset=train_examples,
    valset=val_examples,
)

Local AI

If you’re running CUTIA (or other prompt optimizers) against locally hosted LLMs, vLLM is a solid option for serving models: it supports high-throughput inference and handles concurrent requests efficiently.
vLLM

If you’d like to use a separate prompt model from the task model, llmsnap can help by enabling fast model switching via vLLM’s sleep/wake mode—so you can swap models in seconds.
llmsnap

How It Works

Tree Building: The prompt is recursively split into segments (left, chunk, right)
Node Processing: For each node in the tree:
- Attempt to cut the chunk entirely
- If cutting fails quality check, attempt to rewrite the chunk
- Keep original if both fail
Multi-Candidate: Generate multiple compression variants with different random seeds
Selection: Evaluate candidates on validation set and select the best

Examples

Strawberry Problem (Letter Counting)

Demonstrates prompt compression on a character counting task using the CharBench dataset.

See src/cutia/examples/README.md for details.

Development

Development Installation

For development with testing and linting tools:

# Clone the repository
git clone https://github.com/napmany/cutia.git
cd cutia

# Install with development dependencies
uv sync --extra dev

Running Tests

# Install development dependencies (if not already installed)
uv sync --extra dev

# Run tests
make test

Code Quality

The project uses Ruff for linting and formatting, and Pyright for type checking:

# Run all checks (linting, formatting, and type checking)
make check

Dependencies

Core

No required dependencies for the base library

Install optional dependencies:

# For testing
uv sync --extra test

# For development (includes test dependencies)
uv sync --extra dev

Future Plans

Framework-agnostic core implementation (not tied to DSPy)
Additional adapters for other frameworks and platforms (LangChain, MLflow, etc.)
Standalone Python API for direct use
Enhanced chunking strategies

Star History

Note

⭐️ Star this project to help others discover it!

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
codemaps		codemaps
src/cutia		src/cutia
tests		tests
.coderabbit.yaml		.coderabbit.yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUTIA: Quality-Aware Prompt Compressor

Features

Installation

Usage

DSPy Adapter

Local AI

How It Works

Examples

Strawberry Problem (Letter Counting)

Development

Development Installation

Running Tests

Code Quality

Dependencies

Core

Future Plans

Star History

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

CUTIA: Quality-Aware Prompt Compressor

Features

Installation

Usage

DSPy Adapter

Local AI

How It Works

Examples

Strawberry Problem (Letter Counting)

Development

Development Installation

Running Tests

Code Quality

Dependencies

Core

Future Plans

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 1

Languages

Packages