tokenfit

Fit text into LLM token budgets — estimate, trim, and pack prompts with zero dependencies.

Every app that talks to an LLM eventually fights the same battle: the context window. You have retrieved documents, chat history, logs, and system rules — and they don't all fit. tokenfit is a tiny, dependency-free toolkit that helps you measure how much you have and keep only what fits, without pulling in a megabyte-sized tokenizer.

import { pack, trim, estimateTokens } from "tokenfit";

estimateTokens("How many tokens is this?"); // → 7

// Keep the largest high-priority subset that fits in 4000 tokens
const { text, dropped } = pack(
  [
    { text: systemRules,   priority: 10 },
    { text: retrievedDocs, priority: 5  },
    { text: chatHistory,   priority: 1  },
  ],
  4000,
);

Why tokenfit?

Zero dependencies. No native bindings, no 2 MB vocab files. Drops into edge functions, Cloudflare Workers, browsers, and serverless without a cold-start penalty.
Conservative by design. The built-in estimator brackets real BPE behaviour so you under-fill rather than overflow the window.
Bring your own tokenizer. Need exact counts? Pass tiktoken, Anthropic's tokenizer, or any (text) => number to every API.
Three things, done well. estimateTokens, trim, and pack — fully typed, tested, and documented.
ESM + CJS + types, with a handy CLI.

Install

npm install tokenfit
# or: pnpm add tokenfit  /  yarn add tokenfit  /  bun add tokenfit

API

`estimateTokens(text): number`

Fast, dependency-free token estimate. Blends a chars / 4 and a words / 0.75 signal and takes the larger of the two, so it stays conservative for both prose and code.

estimateTokens("");                  // 0
estimateTokens("hello world");       // 3

Estimates are typically within ~10–15% of tiktoken for English and common code. When you need exact counts, supply countTokens (below).

`trim(text, budget, options?): string`

Trim a string so its token count never exceeds budget. The result — including the ellipsis marker — is guaranteed to fit.

trim(longLog, 2000, { strategy: "start" });   // keep the tail (newest log lines)
trim(bigFile, 1500, { strategy: "middle" });  // keep both ends, drop the middle
trim(article, 500);                            // strategy defaults to "end"

Option	Type	Default	Description
`strategy`	`"end" \| "start" \| "middle"`	`"end"`	Which part of the text to drop.
`ellipsis`	`string`	`"…"`	Marker inserted where text was removed.
`countTokens`	`(text) => number`	built-in	Custom token counter.

`pack(items, budget, options?): PackResult`

Greedily assemble the largest subset of items that fits the budget, highest priority first, accounting for the separator between items.

const result = pack(
  [
    { text: rules,  priority: 10, id: "rules" },
    { text: docA,   priority: 5,  id: "docA"  },
    { text: docB,   priority: 5,  id: "docB"  },
  ],
  3000,
  { separator: "\n\n---\n\n", trimLast: true },
);

result.text;      // assembled prompt, ≤ 3000 tokens
result.tokens;    // estimated token count of result.text
result.included;  // items that made it in (output order)
result.dropped;   // items left out

Option	Type	Default	Description
`separator`	`string`	`"\n\n"`	Inserted between included items.
`trimLast`	`boolean`	`false`	Trim the first non-fitting item to use the leftover.
`trimStrategy`	`"end" \| "start" \| "middle"`	`"end"`	Strategy used when `trimLast` is on.
`countTokens`	`(text) => number`	built-in	Custom token counter.

Bring your own tokenizer

For exact counts, hand any counter to any function:

import { encoding_for_model } from "tiktoken";
import { pack } from "tokenfit";

const enc = encoding_for_model("gpt-4o");
const countTokens = (t: string) => enc.encode(t).length;

pack(items, 8000, { countTokens });

CLI

tokenfit ships a small CLI for shell pipelines:

# Estimate tokens
cat big.log | tokenfit count
tokenfit count README.md

# Trim to a budget (reads stdin or a file)
cat big.log | tokenfit trim -b 2000 -s start
tokenfit trim --budget 500 --strategy middle notes.md

tokenfit count [file]                 Estimate tokens (stdin if no file)
tokenfit trim --budget <n> [file]     Trim text to fit a token budget
  --budget, -b <n>      Token budget (required)
  --strategy, -s <s>    end | start | middle   (default: end)
  --ellipsis <str>      Marker for removed text

Recipes

Keep a chat history under budget (newest wins):

const history = messages.map((m) => ({ text: m.content, priority: m.index }));
const { text } = pack(history, 6000, { trimLast: true, trimStrategy: "start" });

Truncate a noisy log before sending it to a model:

const safe = trim(rawLog, 4000, { strategy: "start" }); // keep the most recent lines

How accurate is the estimator?

The default estimator is a heuristic, not a tokenizer. It is designed to be slightly conservative — it tends to estimate a touch high so your prompts fit on the first try. For budgeting, trimming, and packing this is exactly what you want. When you need guaranteed-exact counts (e.g. billing), plug in a real tokenizer via countTokens.

Contributors ✨

This project follows the all-contributors specification. Contributions of any kind are welcome — code, docs, bug reports, ideas, reviews! See the emoji key for how each contribution is recognized, and open a PR or issue to get involved.

Thanks goes to these wonderful people:

_{Tung Tran}
💻 🚧

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src		src
test		test
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tokenfit

Why tokenfit?

Install

API

`estimateTokens(text): number`

`trim(text, budget, options?): string`

`pack(items, budget, options?): PackResult`

Bring your own tokenizer

CLI

Recipes

How accurate is the estimator?

Contributors ✨

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tokenfit

Why tokenfit?

Install

API

estimateTokens(text): number

trim(text, budget, options?): string

pack(items, budget, options?): PackResult

Bring your own tokenizer

CLI

Recipes

How accurate is the estimator?

Contributors ✨

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`estimateTokens(text): number`

`trim(text, budget, options?): string`

`pack(items, budget, options?): PackResult`

Packages