-
Notifications
You must be signed in to change notification settings - Fork 678
make num writer threads configurable #6343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -2859,14 +2859,19 @@ def recommend_thread_pool_workers(num_workers=None): | |||||||||||||||||||||||||
If a ``fo.config.max_thread_pool_workers`` is set, this limit is applied. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||||
num_workers (None): a suggested number of workers | ||||||||||||||||||||||||||
num_workers (None): a suggested number of workers. If ``num_workers <= 0``, this | ||||||||||||||||||||||||||
function returns 1. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||||
a number of workers | ||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
if num_workers is None: | ||||||||||||||||||||||||||
num_workers = multiprocessing.cpu_count() | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
if num_workers <= 0: | ||||||||||||||||||||||||||
num_workers = 1 | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
if fo.config.max_thread_pool_workers is not None: | ||||||||||||||||||||||||||
num_workers = min(num_workers, fo.config.max_thread_pool_workers) | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
@@ -3147,7 +3152,7 @@ def validate_hex_color(value): | |||||||||||||||||||||||||
|
||||||||||||||||||||||||||
@contextmanager | ||||||||||||||||||||||||||
def async_executor( | ||||||||||||||||||||||||||
*, max_workers, skip_failures=False, warning="Async failure" | ||||||||||||||||||||||||||
*, max_workers=None, skip_failures=False, warning="Async failure" | ||||||||||||||||||||||||||
): | ||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||
Context manager that provides a function for submitting tasks to a thread | ||||||||||||||||||||||||||
|
@@ -3160,14 +3165,21 @@ def async_executor( | |||||||||||||||||||||||||
submit(process_item, item) | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||||
max_workers: the maximum number of workers to use | ||||||||||||||||||||||||||
max_workers (None): the maximum number of workers to use. By default, | ||||||||||||||||||||||||||
this is determined by :func:`fiftyone.core.utils.recommend_thread_pool_workers`. | ||||||||||||||||||||||||||
skip_failures (False): whether to skip exceptions raised by tasks | ||||||||||||||||||||||||||
warning ("Async failure"): the warning message to log if a task | ||||||||||||||||||||||||||
raises an exception and ``skip_failures == True`` | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Raises: | ||||||||||||||||||||||||||
Exception: if a task raises an exception and ``skip_failures == False`` | ||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||
if max_workers is None: | ||||||||||||||||||||||||||
max_workers = ( | ||||||||||||||||||||||||||
fo.config.default_thread_pool_workers | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. from the data, it looks like there is no benefit after 8 (most the benefits are seen with 4) so it seems like we should cap it rather than use the default (cpu_count()). For high cpu machines, this could have deleterious affects. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The data attached is for computing dense embeddings with a (relatively) large batch size. It's of interest because we will use this workload in det RERs. It's also likely reflective of other very heavy mongo I/O workloads. Conversely, something like a regular Given that the user can already configure the number of threads via any of:
I would rather not put a hard cap here. What concerns do you have for very many threads being used? Maybe they can be separately adressed. |
||||||||||||||||||||||||||
or recommend_thread_pool_workers(max_workers) | ||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||
Comment on lines
+3177
to
+3181
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Clamp defaults to config max and guard against zero workers When - if max_workers is None:
- max_workers = (
- fo.config.default_thread_pool_workers
- or recommend_thread_pool_workers(max_workers)
- )
+ if max_workers is None:
+ cfg_default = getattr(fo.config, "default_thread_pool_workers", None)
+ # Clamp to fo.config.max_thread_pool_workers inside helper
+ max_workers = recommend_thread_pool_workers(cfg_default)
+ if max_workers is None or max_workers < 1:
+ # ThreadPoolExecutor rejects 0; ensure a sensible minimum
+ max_workers = 1 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
with ThreadPoolExecutor(max_workers=max_workers) as executor: | ||||||||||||||||||||||||||
_futures = [] | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Verification inconclusive
Keyword‑only args may break existing positional calls; keep first arg positional or verify all call sites
Switching to keyword‑only can break user code calling with
async_executor(4)
. Either keep the first parameter positional, or confirm all call sites are keyworded.Option A (backward compatible):
Option B (keep as-is): verify no positional usages in the repo:
🏁 Script executed:
Length of output: 167
async_executor signature change is a breaking change
No internal calls rely on a positional first argument, but as a public API making
max_workers
keyword-only will break external callers. Either revert to a backward-compatible signature:or keep the keyword-only signature but document the breaking change and bump the major version.
📝 Committable suggestion
🤖 Prompt for AI Agents