contextualize
is a package to quickly retrieve and format file contents for use with LLMs.
![](https://private-user-images.githubusercontent.com/30947643/316324886-01dbcec2-69fc-405a-8d91-0a00626f8946.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NTI1NjYsIm5iZiI6MTczOTQ1MjI2NiwicGF0aCI6Ii8zMDk0NzY0My8zMTYzMjQ4ODYtMDFkYmNlYzItNjlmYy00MDVhLThkOTEtMGEwMDYyNmY4OTQ2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDEzMTEwNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTlhMmUxNDk3NDcwNzEyNWRmOGRiYzdkYjkwZDVjODc4NGIxMTU2OGNmNzgwYmUwMjI3Y2U2Y2Y5MjdhMTkxOTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.qj_Ljnw9xjVyMIjfi2xhlIlVXMoZ6LeZ9ooCWShbXSg)
You can install the package using pip:
pip install contextualize
or with uv to use the CLI globally:
uv tool install contextualize
Define FileReference
objects for specified file paths and optional ranges.
- set
range
to a tuple of line numbers to include only a portion of the file, e.g.range=(1, 10)
- set
format
to "md" (default) or "xml" to wrap file contents in Markdown code blocks or<file>
tags - set
label
to "relative" (default), "name", or "ext" to determine what label is affixed to the enclosing Markdown/XML string- "relative" will use the relative path from the current working directory
- "name" will use the file name only
- "ext" will use the file extension only
Retrieve wrapped contents from the output
attribute.
A CLI (cli.py
) is provided to print file contents to the console from the command line.
-
cat
: Prepare and concatenate file referencespaths
: Positional arguments for target file(s) or directories--ignore
: File(s) to ignore (optional)--format
: Output format (md
,xml
, orshell
; default ismd
):shell
mimicscat
output in a live shell promptxml
encloses file contents in<file>
tagsmd
encloses file contents in triple backticks
--label
: Label style (relative
for relative file path,name
for file name only,ext
for file extension only; default isrelative
)--output
: Output target (console
(default),clipboard
)--output-file
: Output file path (optional, compatible with--output clipboard
)
-
ls
: List file token countspaths
: Positional arguments for target file(s) or directories to process--openai-encoding
: OpenAI encoding to use for tokenization, e.g.,cl100k_base
(default),p50k_base
,r50k_base
--openai-model
: OpenAI model (e.g.,gpt-3.5-turbo
/gpt-4
(default),text-davinci-003
,code-davinci-002
) to determine which encoding to use for tokenization.--anthropic-model
: Anthropic model to use for token counting (e.g.,claude-3-5-sonnet-latest
)
-
map
: Generate file/repo maps with aiderpaths
: Positional arguments for file(s) or folder(s) to include in the repo map-t, --max-tokens
: Maximum tokens for the repo map (default: 10000)--output
: Output target (options:console
(default),clipboard
)--output-file
: Optional output file path
-
cat
:contextualize cat README.md
will print the wrapped contents ofREADME.md
to the console with default settings (Markdown format, relative path label).contextualize cat README.md --format xml
will print the wrapped content ofREADME.md
to the console with XML format.contextualize cat README.md --format shell
will print the content as if a user is runningcat README.md
in a live shell prompt.contextualize cat contextualize/ dev/ README.md --format xml
will prepare file references for files in thecontextualize/
anddev/
directories andREADME.md
, and print each file’s contents (wrapped in corresponding XML tags) to the console.
-
ls
:contextualize ls README.md
will count and print the number of tokens inREADME.md
using the defaultcl100k_base
encoding, unlessANTHROPIC_API_KEY
is set, in which case the Anthropic token counting API will be used.contextualize ls contextualize/ --openai-model text-davinci-003
will count and print the number of tokens in each file in thecontextualize/
directory using thep50k_base
encoding associated with thetext-davinci-003
model, then print the total tokens for all processed files.
-
map
:contextualize map .
will generate and print a repository map for the current directory.contextualize map src/ tests/ -t 15000
will generate a repository map for files in thesrc/
andtests/
directories with a maximum of 15000 tokens.