Docling-serve UI hangs indefinitely when selecting pipeline Vlm

Linked to the issue https://github.com/docling-project/docling-serve/issues/399 (which still seems to be happening to me). 

When selecting pipeline type = Vlm from the docling-serve UI, inference seems to run indefinitely (***100% usage*** of local GPU). The auto-selected VLM model is [Granite-docling](https://huggingface.co/ibm-granite/granite-docling-258M). 

I've reproduced this behavior using the following docling-serve API invocation: 
```sh
curl -X 'POST' \
  'http://localhost:5001/v1/convert/source' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}],
    "options": {
      "ocr_engine": "easyocr",
      "pdf_backend": "dlparse_v4",
      "from_formats": ["pdf", "docx"],
      "force_ocr": false,
      "image_export_mode": "placeholder",
      "do_ocr": true,
      "ocr_lang": ["en", "it"],
      "table_mode": "accurate",
      "to_formats": ["md", "json", "html", "text", "doctags"],
      "abort_on_error": false,
      "pipeline": "vlm",
      "vlm_pipeline_model": "granite_docling"
    }
  }'
```

However. I've recently found out that this problem ***doesn't*** reproduce if you invoke the docling-serve API the following way (HuggingFace parameters documented at ): 
```sh
curl -X 'POST' \
  'http://localhost:5001/v1/convert/source' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}],
    "options": {
      "ocr_engine": "easyocr",
      "pdf_backend": "dlparse_v4",
      "from_formats": ["pdf", "docx"],
      "force_ocr": false,
      "image_export_mode": "placeholder",
      "do_ocr": true,
      "ocr_lang": ["en", "it"],
      "table_mode": "accurate",
      "to_formats": ["md", "json", "html", "text", "doctags"],
      "abort_on_error": false,
      "pipeline": "vlm",
      "vlm_pipeline_model_local": {
        "repo_id": "ibm-granite/granite-docling-258M",
        "inference_framework": "transformers",
        "transformers_model_type": "automodel-imagetexttotext",
        "temperature": 0.0,
        "prompt": "Convert this page to docling.",
        "response_format": "doctags"
      }
    }
  }'
```

***PLEASE NOTE***: The hanging problem ***doesn't*** reproduce by running the underlying ***docling*** library the following way: 
```sh
docling --pipeline vlm --vlm-model granite_docling "https://arxiv.org/pdf/2501.17887"
```

So, it seems that the underlying docling library is executed in a way more similar to the ***second*** docling-serve API invocation example, while the docling-serve UI seems to be more similar to the first example above.

docling-serve version = 1.14.3
docling version = 2.80.0
GPU: NVIDIA RTX 4000 SFF Ada Generation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docling-serve UI hangs indefinitely when selecting pipeline Vlm #542

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docling-serve UI hangs indefinitely when selecting pipeline Vlm #542

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions