The CLI client wraps the Python Docling command-line tool. It runs conversions locally without a server, with support for batch processing, directory watching, file validation, and automatic retry with error classification.
- Node.js >= 18
- Python 3 with Docling installed:
pip install docling
import { Docling } from "docling-sdk";
const client = new Docling({
cli: {
outputDir: "./output",
verbose: true,
},
});Or use the convenience factory:
import { createCLIClient } from "docling-sdk";
const client = createCLIClient({
outputDir: "./converted-docs",
verbose: true,
concurrency: 4,
});The CLI client implements DoclingClientBase, providing the same conversion interface as the API client:
const result = await client.convert("./document.pdf", "document.pdf", {
to_formats: ["md"],
});
console.log(result.document.md_content);
console.log(result.status);// Markdown
const md = await client.toMarkdown("./doc.pdf", "doc.pdf");
// HTML
const html = await client.toHtml("./doc.pdf", "doc.pdf");
// Plain text
const text = await client.extractText("./doc.pdf", "doc.pdf");Returns conversion results as a ZIP archive:
const result = await client.convertToFile("./doc.pdf", "doc.pdf", {
to_formats: ["md", "json"],
});
if (result.success && result.fileStream) {
result.fileStream.pipe(createWriteStream("./output.zip"));
}Process multiple files in parallel:
const result = await client.batch(
["./file1.pdf", "./file2.pdf", "./file3.pdf"],
{
to_formats: ["md"],
outputDir: "./batch-output",
parallel: true,
maxConcurrency: 4,
}
);
console.log(`Success: ${result.success}`);
for (const r of result.results) {
console.log(`${r.file}: ${r.success ? "OK" : r.error}`);
}Convert all supported files in a directory:
const result = await client.processDirectory("./documents", {
to_formats: ["md"],
});
console.log(`Processed ${result.totalFiles} files`);
for (const r of result.results) {
console.log(r.document.filename, r.status);
}Automatically convert files as they appear:
await client.watch("./incoming", {
outputDir: "./converted",
recursive: true,
patterns: ["*.pdf", "*.docx"],
debounce: 1000,
});Check which files are valid for conversion:
const { valid, invalid } = await client.validateFiles([
"./doc.pdf",
"./readme.txt",
"./missing.pdf",
]);
console.log("Valid:", valid);
for (const { file, reason } of invalid) {
console.log(`Invalid: ${file} -- ${reason}`);
}client.setOutputDir("./new-output");import { CliError, CliTimeoutError, CliNotFoundError } from "docling-sdk";| Class | When thrown |
|---|---|
CliError |
CLI process exits with non-zero code |
CliTimeoutError |
CLI process exceeds timeout |
CliNotFoundError |
Docling binary not found at configured path |
try {
await client.convert("./doc.pdf", "doc.pdf");
} catch (error) {
if (error instanceof CliError) {
console.log(error.exitCode);
console.log(error.stderr);
}
if (error instanceof CliTimeoutError) {
console.log("Process timed out");
}
if (error instanceof CliNotFoundError) {
console.log("Install docling: pip install docling");
}
}The CLI client classifies errors into five types and retries accordingly:
| Type | Retryable | Description |
|---|---|---|
transient |
Yes | Temporary failures, intermittent issues |
timeout |
Yes | Process timeouts |
resource |
Yes | Memory or disk pressure |
permanent |
No | Invalid input, corrupt files |
configuration |
No | Wrong binary path, missing Python |
Retry uses exponential backoff:
Delay = min(baseDelay * backoffMultiplier^attempt, maxDelay)
Default retry configuration:
| Setting | Default |
|---|---|
maxRetries |
3 |
baseDelay |
1000 ms |
maxDelay |
30000 ms |
backoffMultiplier |
2 |
retryableErrors |
transient, timeout, resource |
The CLI client exposes a progress event emitter:
client.progress.on("progress", (event) => {
console.log(event.type); // "start" | "progress" | "complete" | "error"
console.log(event.file); // current file
console.log(event.percentage); // overall progress
console.log(event.eta); // estimated time remaining (ms)
console.log(event.currentStep); // current processing step
console.log(event.filesCompleted); // files completed so far
console.log(event.totalFiles); // total files to process
});The ProgressEvent structure:
| Field | Type | Description |
|---|---|---|
type |
"start" | "progress" | "complete" | "error" |
Event type |
file |
string |
Current file being processed |
format |
string |
Current output format |
percentage |
number |
Overall progress (0-100) |
eta |
number |
Estimated time remaining in ms |
currentStep |
string |
Current processing step |
filesCompleted |
number |
Number of files completed |
totalFiles |
number |
Total number of files |
formatsCompleted |
number |
Formats completed for current file |
totalFormats |
number |
Total formats to generate |
processingTime |
number |
Time elapsed in ms |
averageTimePerFile |
number |
Average time per file in ms |
averageTimePerFormat |
number |
Average time per format in ms |
The CLI client accepts CliConvertOptions which map to Docling CLI flags:
| Option | Type | CLI flag |
|---|---|---|
sources |
string[] |
positional arguments |
fromFormats |
InputFormat[] |
--from |
toFormats |
OutputFormat[] |
--to |
output |
string |
--output |
pipeline |
ProcessingPipeline |
--pipeline |
vlmModel |
CliVlmModelType |
--vlm-model |
asrModel |
AsrModelType |
--asr-model |
ocr |
boolean |
--ocr |
forceOcr |
boolean |
--force-ocr |
ocrEngine |
OcrEngine |
--ocr-engine |
ocrLang |
string[] |
--ocr-lang |
pdfBackend |
PdfBackend |
--pdf-backend |
tableMode |
TableMode |
--table-mode |
imageExportMode |
ImageExportMode |
--image-export-mode |
enrichCode |
boolean |
--enrich-code |
enrichFormula |
boolean |
--enrich-formula |
enrichPictureClasses |
boolean |
--enrich-picture-classes |
enrichPictureDescriptions |
boolean |
--enrich-picture-descriptions |
abortOnError |
boolean |
--abort-on-error |
documentTimeout |
number |
--doc-timeout |
numThreads |
number |
--num-threads |
device |
AcceleratorDevice |
--device |
verbose |
number |
-v (repeated) |
const client = new Docling({
cli: {
outputDir: "./output",
},
});
// The internal config supports:
// pythonPath: path to python3 binary (default: "python3")
// doclingPath: path to docling binary (default: "docling")Pass environment variables to the CLI process via the CliExecutionOptions:
// Available through the internal CliConfig
{
env: {
DOCLING_ARTIFACTS_PATH: "/path/to/models",
CUDA_VISIBLE_DEVICES: "0",
},
}const result = await client.safeConvert("./doc.pdf", "doc.pdf");
if (result.success) {
console.log(result.data.document.md_content);
} else {
console.error(result.error.message);
}- Getting Started -- first CLI conversion
- Configuration -- CLI config options
- Error Handling -- error classes and retry logic
- API Reference -- full method reference