fix(client): Harden upload validation and conversion flow#989
fix(client): Harden upload validation and conversion flow#989Gujiassh wants to merge 6 commits intobytedance:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the embedded DeerFlowClient upload flow to better match Gateway expectations by validating inputs up-front and improving the document-conversion execution strategy when invoked under an active asyncio event loop, with corresponding documentation updates.
Changes:
- Reject non-file inputs (e.g., directory paths) before copying any uploads to keep uploads all-or-nothing.
- Reuse a single conversion worker (thread) per request when conversions must run from within an active event loop.
- Update backend docs (README + CLAUDE guide) and extend client tests for the new upload validation behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| backend/tests/test_client.py | Adds coverage for directory-path rejection in upload_files and reformats several test blocks. |
| backend/src/client.py | Adds upfront is_file() validation, reworks conversion execution to reuse a single executor under a running event loop, and switches list_uploads to os.scandir. |
| backend/README.md | Documents that uploads reject directory paths. |
| backend/CLAUDE.md | Documents new upload validation + conversion worker reuse behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not p.exists(): | ||
| raise FileNotFoundError(f"File not found: {f}") | ||
| if not p.is_file(): | ||
| raise ValueError(f"Path is not a file: {f}") |
There was a problem hiding this comment.
upload_files() now raises ValueError when an input path exists but is not a regular file (e.g., a directory). The docstring’s Raises: section only documents FileNotFoundError; please document the ValueError case as well (and any other newly introduced exceptions) so callers know what to expect.
| conversion_pool = None | ||
| if has_convertible_file: | ||
| try: | ||
| asyncio.get_running_loop() | ||
| except RuntimeError: | ||
| conversion_pool = None | ||
| else: | ||
| import concurrent.futures | ||
|
|
||
| dest = uploads_dir / src_path.name | ||
| shutil.copy2(src_path, dest) | ||
| # Reuse one worker when already inside an event loop to avoid | ||
| # creating a new ThreadPoolExecutor per converted file. | ||
| conversion_pool = concurrent.futures.ThreadPoolExecutor(max_workers=1) |
There was a problem hiding this comment.
The new branch that creates/reuses a single ThreadPoolExecutor when upload_files() is called from inside an active event loop is not covered by tests. Please add an async test (e.g., pytest.mark.asyncio) that calls upload_files() with multiple convertible files while an event loop is running and asserts only one executor is created (and conversion still succeeds/fails gracefully).
Summary
Testing
uv run pytest tests/test_client.py -q