Releases · docling-project/docling · GitHub

22 May 18:44

docling-ops

v2.34.0

Feature

ocr: Auto-detect rotated pages in Tesseract (#1167) (45265bf)
Establish confidence estimation for document and pages (#1313) (9087524)

Fix

Fix ZeroDivisionError for cell_bbox.area() (#1636) (c2f595d)
integration: Update the Apify Actor integration (#1619) (14d4f5b)

Assets 2

20 May 19:54

docling-ops

v2.33.0

Feature

Add textbox content extraction in msword_backend (#1538) (12a0e64)

Fix

Fix issue with detecting docx files, and files with upper case extensions (#1609) (f4d9d41)
Load_from_doctags static usage (#1617) (0e00a26)
Incorrect force_backend_text behaviour for VLM DocTag pipelines (#1371) (f2e9c07)
pypdfium: Resolve overlapping text when merging bounding boxes (#1549) (98b5eeb)

Assets 2

14 May 14:28

docling-ops

v2.32.0

Feature

Improve parallelization for remote services API calls (#1548) (3a04f2a)
Support image/webp file type (#1415) (12dab0a)

Fix

ocr: Orig field in TesseractOcrCliModel as str (#1553) (9f8b479)
settings: Fix nested settings load via environment variables (#1551) (2efb7a7)

Documentation

Add advanced chunking & serialization example (#1589) (9f28abf)

Assets 2

13 May 10:09

docling-ops

v2.31.2

Fix

AsciiDoc header identification (#1562) (#1563) (4046d0b)
Restrict click version and update lock file (#1582) (8baa85a)

Assets 2

12 May 09:44

docling-ops

v2.31.1

Fix

Add smoldocling in download utils (#1577) (127e386)
HTML: Handle row spans in header rows (#1536) (776e7ec)
Mime error in document streams (#1523) (f1658ed)
Usage of hashlib for FIPS (#1512) (7c70573)
Guard against attribute errors in TesseractOcrModel del (#1494) (4ab7e9d)
Enable cuda_use_flash_attention2 for PictureDescriptionVlmModel (#1496) (cc45396)
Updated the time-recorder label for reading order (#1490) (976e92e)
Incorrect scaling of TableModel bboxes when do_cell_matching is False (#1459) (94d66a0)

Documentation

Update links in data_prep_kit (#1559) (844babb)
Add serialization docs, update chunking docs (#1556) (3220a59)
Update supported formats guide (#1463) (3afbe6c)

Assets 2

25 Apr 08:28

docling-ops

v2.31.0

Feature

Add tutorial using Milvus and Docling for RAG pipeline (#1449) (a2fbbba)

Fix

html: Handle address, details, and summary tags (#1436) (ed20124)
Treat overflowing -v flags as DEBUG (#1419) (8012a3e)
codecov: Fix codecov argument and yaml file (#1399) (fa7fc9e)

Documentation

Fix wrong output format in example code (#1427) (c2470ed)
Add OpenSSF Best Practices badge (#1430) (64918a8)
Typo fixes in docling_document.md (#1400) (995b3b0)
Updated the [Usage] link in architecture.md (#1416) (88948b0)
ocr: Add docs entry for OnnxTR OCR plugin (#1382) (a7dd59c)
security: More statements about secure development (#1381) (293c28c)
Add testing in the docs (#1379) (01fbfd5)
Add Notes for Installing in Intel macOS (#1377) (a026b4e)

Assets 2

14 Apr 08:20

docling-ops

v2.30.0

Feature

cli: Add option for html with split-page mode (#1355) (c0ba88e)
xlsx: Create a page for each worksheet in XLSX backend (#1332) (eef2bde)
OllamaVlmModel for Granite Vision 3.2 (#1337) (c605edd)

Fix

deps: Widen typer upper bound (#1375) (7e40ad3)
Auto-recognize .xlsx, .docx and .pptx files (#1340) (0de70e7)
docx: Declare image_data variable when handling pictures (#1359) (415b877)
Implement PictureDescriptionApiOptions.bitmap_area_threshold (#1248) (2503999)
Properly address page in pipeline _assemble_document when page_range is provided (#1334) (6b696b5)

Assets 2

10 Apr 12:24

docling-ops

v2.29.0

Feature

Handle <code> tags as code blocks (#1320) (0499cd1)
docx: Add text formatting and hyperlink support (#630) (bfcab3d)

Fix

docx: Adding new latex symbols, simplifying how equations are added to text (#1295) (14e9c0c)
pptx: Check if picture shape has an image attached (#1316) (dc3bf9c)
docx: Improve text parsing (#1268) (d2d6874)
Tesseract OCR CLI can't process images composed with numbers only (#1201) (b3d111a)

Documentation

Add plugins docs (#1319) (2e99e5a)
Add visual grounding example (#1270) (71148eb)

Assets 2

29 Mar 11:56

docling-ops

v2.28.4

Fix

Fixes tables when using OCR (#1261) (7afad7e)

Assets 2

28 Mar 18:30

docling-ops

v2.28.3

Fix

Word-level pdf cells for tables (#1238) (8bd71e8)

Assets 2