perf: move CPU-bound ingest work off the event loop#449
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #449 +/- ##
=======================================
Coverage 95.96% 95.96%
=======================================
Files 124 124
Lines 7132 7140 +8
=======================================
+ Hits 6844 6852 +8
Misses 288 288 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Owner
|
Moved stored-document preparation off the event loop. The async client paths now use a single helper |
Owner
|
Moved fetched-body temp-file writes off the event loop for non-FS type sources. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
perf: move CPU-bound ingest work off the event loop
Summary
The ingester runs
worker_countasync workers that share a single event loop.Several steps in the convert → chunk → embed → store pipeline performed
CPU-bound work synchronously on the event-loop thread. Whenever one worker
hit such a call, every other worker's coroutine was frozen for its entire
duration — turning what should be a few-second store step into blocks of
hundreds to a thousand-plus seconds under a batch run, because workers pile up
behind each other's synchronous bursts.
This PR dispatches those calls through
asyncio.to_thread, so the work runs inthe thread pool and the event loop stays free to drive the other workers. It
follows the same pattern already used for PDF slicing and the embedded-PDF
attachment scan (the related fix earlier on this branch).
Why these calls in particular
This was scoped against a production batch configuration: filesystem source,
docling-servefor both conversion and chunking,split_pages: 20,generate_page_imageson (default) atimages_scale: 1.0. That config inlinesfull-resolution page rasters as base64 into the
DoclingDocument, which makesthe serialization and parse steps below genuinely heavy — proportional to total
document size, not just structure.
Changes
chunkers/docling_serve.pydocument.model_dump_json()of an image-laden document before the chunk requestawait asyncio.to_thread(...)converters/docling_serve.py_parse_zip_to_docling— zip decompress, per-image base64 re-encode,DoclingDocument.model_validateawait asyncio.to_thread(...)converters/pdf_split.pyDoclingDocument.concatenate(slices)merging inlined-image slice docsawait asyncio.to_thread(...)ingester/sources/fs.pypath.read_bytes()+hashlib.md5over the whole file_read_body(stat + size-check + read + hash), called viaawait asyncio.to_thread(...)The docling-serve HTTP submit/poll was already async (
httpx.AsyncClient), andopenai embeddings are network-bound, so those paths were left unchanged.
Tests
Each fix has a thread-identity test that captures
threading.current_thread()where the work runs and asserts it is not the main thread — so a regression
back to a synchronous call fails with a clear message, independent of timing or
document size:
tests/test_chunker.py::TestDoclingServeChunker::test_chunk_serializes_document_off_event_loop_threadtests/test_converters.py::test_parse_zip_runs_off_event_loop_threadtests/test_pdf_split.py::test_concatenate_runs_off_event_loop_threadtests/ingester/test_fs_source.py::test_fs_source_fetch_reads_off_event_loop_threadThe first three pass in CI. The FS test passes wherever
FSSource._resolve_within_rootresolves
file://URIs correctly (Linux/CI); on Windows it is blocked by apre-existing path-resolution issue that fails every fetch-path FS test on
main, unrelated to this change. The FS code change was verified directly:_read_bodyruns on a worker thread with correct body and md5.ruff checkandruff format --checkpass on all changed files.Effect
For the target config, the dominant blocker — serializing full-resolution
base64 page rasters for the whole document on every document — and the per-slice
zip parse now run off the loop. A single large PDF no longer freezes the other
workers during convert/chunk. Combined with the earlier attachment-scan fix, the
convert → chunk path no longer blocks the event loop.
Out of scope
The store step's
_write_lock(a singleasyncio.Lockheld across thesequential table writes in
client/documents.pyandclient/__init__.py)still serializes all workers at commit time. That is a separate
lock-granularity issue, not addressed here.