Skip to content

Docling-serve UI hangs indefinitely when selecting pipeline Vlm #542

@pierdomenicobatzu

Description

@pierdomenicobatzu

Linked to the issue #399 (which still seems to be happening to me).

When selecting pipeline type = Vlm from the docling-serve UI, inference seems to run indefinitely (100% usage of local GPU). The auto-selected VLM model is Granite-docling.

I've reproduced this behavior using the following docling-serve API invocation:

curl -X 'POST' \
  'http://localhost:5001/v1/convert/source' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}],
    "options": {
      "ocr_engine": "easyocr",
      "pdf_backend": "dlparse_v4",
      "from_formats": ["pdf", "docx"],
      "force_ocr": false,
      "image_export_mode": "placeholder",
      "do_ocr": true,
      "ocr_lang": ["en", "it"],
      "table_mode": "accurate",
      "to_formats": ["md", "json", "html", "text", "doctags"],
      "abort_on_error": false,
      "pipeline": "vlm",
      "vlm_pipeline_model": "granite_docling"
    }
  }'

However. I've recently found out that this problem doesn't reproduce if you invoke the docling-serve API the following way (HuggingFace parameters documented at ):

curl -X 'POST' \
  'http://localhost:5001/v1/convert/source' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}],
    "options": {
      "ocr_engine": "easyocr",
      "pdf_backend": "dlparse_v4",
      "from_formats": ["pdf", "docx"],
      "force_ocr": false,
      "image_export_mode": "placeholder",
      "do_ocr": true,
      "ocr_lang": ["en", "it"],
      "table_mode": "accurate",
      "to_formats": ["md", "json", "html", "text", "doctags"],
      "abort_on_error": false,
      "pipeline": "vlm",
      "vlm_pipeline_model_local": {
        "repo_id": "ibm-granite/granite-docling-258M",
        "inference_framework": "transformers",
        "transformers_model_type": "automodel-imagetexttotext",
        "temperature": 0.0,
        "prompt": "Convert this page to docling.",
        "response_format": "doctags"
      }
    }
  }'

PLEASE NOTE: The hanging problem doesn't reproduce by running the underlying docling library the following way:

docling --pipeline vlm --vlm-model granite_docling "https://arxiv.org/pdf/2501.17887"

So, it seems that the underlying docling library is executed in a way more similar to the second docling-serve API invocation example, while the docling-serve UI seems to be more similar to the first example above.

docling-serve version = 1.14.3
docling version = 2.80.0
GPU: NVIDIA RTX 4000 SFF Ada Generation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions