Releases: docling-project/docling
Releases Β· docling-project/docling
v2.34.0
v2.33.0
Feature
Fix
- Fix issue with detecting docx files, and files with upper case extensions (#1609) (
f4d9d41) - Load_from_doctags static usage (#1617) (
0e00a26) - Incorrect force_backend_text behaviour for VLM DocTag pipelines (#1371) (
f2e9c07) - pypdfium: Resolve overlapping text when merging bounding boxes (#1549) (
98b5eeb)
v2.32.0
v2.31.2
v2.31.1
Fix
- Add smoldocling in download utils (#1577) (
127e386) - HTML: Handle row spans in header rows (#1536) (
776e7ec) - Mime error in document streams (#1523) (
f1658ed) - Usage of hashlib for FIPS (#1512) (
7c70573) - Guard against attribute errors in TesseractOcrModel del (#1494) (
4ab7e9d) - Enable cuda_use_flash_attention2 for PictureDescriptionVlmModel (#1496) (
cc45396) - Updated the time-recorder label for reading order (#1490) (
976e92e) - Incorrect scaling of TableModel bboxes when do_cell_matching is False (#1459) (
94d66a0)
Documentation
v2.31.0
Feature
Fix
- html: Handle address, details, and summary tags (#1436) (
ed20124) - Treat overflowing -v flags as DEBUG (#1419) (
8012a3e) - codecov: Fix codecov argument and yaml file (#1399) (
fa7fc9e)
Documentation
- Fix wrong output format in example code (#1427) (
c2470ed) - Add OpenSSF Best Practices badge (#1430) (
64918a8) - Typo fixes in docling_document.md (#1400) (
995b3b0) - Updated the [Usage] link in architecture.md (#1416) (
88948b0) - ocr: Add docs entry for OnnxTR OCR plugin (#1382) (
a7dd59c) - security: More statements about secure development (#1381) (
293c28c) - Add testing in the docs (#1379) (
01fbfd5) - Add Notes for Installing in Intel macOS (#1377) (
a026b4e)
v2.30.0
Feature
- cli: Add option for html with split-page mode (#1355) (
c0ba88e) - xlsx: Create a page for each worksheet in XLSX backend (#1332) (
eef2bde) - OllamaVlmModel for Granite Vision 3.2 (#1337) (
c605edd)
Fix
- deps: Widen typer upper bound (#1375) (
7e40ad3) - Auto-recognize .xlsx, .docx and .pptx files (#1340) (
0de70e7) - docx: Declare image_data variable when handling pictures (#1359) (
415b877) - Implement PictureDescriptionApiOptions.bitmap_area_threshold (#1248) (
2503999) - Properly address page in pipeline _assemble_document when page_range is provided (#1334) (
6b696b5)
v2.29.0
Feature
- Handle
<code>tags as code blocks (#1320) (0499cd1) - docx: Add text formatting and hyperlink support (#630) (
bfcab3d)
Fix
- docx: Adding new latex symbols, simplifying how equations are added to text (#1295) (
14e9c0c) - pptx: Check if picture shape has an image attached (#1316) (
dc3bf9c) - docx: Improve text parsing (#1268) (
d2d6874) - Tesseract OCR CLI can't process images composed with numbers only (#1201) (
b3d111a)