Skip to content

Move to Docling's Hybrid Chunking #503

@bbrowning

Description

@bbrowning

We're using a legacy docling version, custom parsing of the docling v1 json, and our own chunking. Docling has a hybrid chunker that was introduced since our initial Docling implementation that should remove the need for us to do our own parsing of the Docling json or our own chunking.

The RAG work in instructlab/instructlab already uses the HybridChunker, so this is scoped to just getting SDG on a current version of Docling and the same chunking strategy.

Sub-issues

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions